我是靠谱客的博主 英俊老虎,最近开发中收集的这篇文章主要介绍让分区表和数据产生关联的三种方式,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

目录

总结:

方式一:上传数据后添加分区alter add:

方式二:上传数据后修复msck:

方式三:创建文件夹后load数据到分区:

总结:


总结:

让分区表和数据产生关联的方式有三种:

①先在HDFS上创建分区的目录,并上传数据到该目录,最后在使用alter table add partition

②先在HDFS上创建分区的目录,并上传数据到该目录,最后使用msck repair table table_name

③先在HDFS上创建分区的目录,在使用 load data local 路径 into table table_name partition(字段名=字段值) 

将数据上传到分区目录上,让分区表和数据产生关联的三种方式:数据准备:创建一个没有分区的普通表stu_par
create table stu_par(id int,name string)
row format delimited
fields terminated by 't';

方式一:上传数据后添加分区alter add:

> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=09
> !sh hadoop fs -put /opt/module/hive/stu.txt /user/hive/warehouse/db_hive.db/stu_par/month=09
重复操作省略


> !sh hadoop fs -ls /user/hive/warehouse/db_hive.db/stu_par/
Found 4 items
drwxr-xr-x   - isea supergroup      0 2018-12-01 05:50 /user/hive/warehouse/db_hive.db/stu_par/month=09
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=10
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=11
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:30 /user/hive/warehouse/db_hive.db/stu_par/month=12

0: jdbc:hive2://hadoop108:10000> select * from stu_par where month  = '09';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+

此时的无法查询到数据

0: jdbc:hive2://hadoop108:10000> alter table stu_par add partition(month = '09');

0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '09';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 09             |
| 1002        | liubei        | 09             |
| 1003        | guanyu        | 09             |
| 1004        | zhaoyun       | 09             |
| 1005        | caocao        | 09             |
| 1006        | zhouyu        | 09             |
+-------------+---------------+----------------+--+

方式二:上传数据后修复msck:

1,在HDFS上创建08分区,并上传数据
> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=08
0: jdbc:hive2://hadoop108:10000> !sh hadoop fs -ls /user/hive/warehouse/db_hive.db/stu_par/
Found 5 items
drwxr-xr-x   - isea supergroup      0 2018-12-01 06:06 /user/hive/warehouse/db_hive.db/stu_par/month=08
drwxr-xr-x   - isea supergroup      0 2018-12-01 05:54 /user/hive/warehouse/db_hive.db/stu_par/month=09
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=10
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=11
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:30 /user/hive/warehouse/db_hive.db/stu_par/month=12
> !sh hadoop fs -put /opt/module/hive/stu.txt /user/hive/warehouse/db_hive.db/stu_par/month=08

此时,在HDFS上有数据,但是该表中并没有对应该数据的,所以还是查询不到数据
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '08';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+
No rows selected (0.091 seconds)

此时,我使用msck 修复一下这个分区表,这个修复命令会自动调用上面修改表的命令完成元数据的引入
0: jdbc:hive2://hadoop108:10000> msck repair table stu_par;
OK
No rows affected (0.215 seconds)
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '08';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 08             |
| 1002        | liubei        | 08             |
| 1003        | guanyu        | 08             |
| 1004        | zhaoyun       | 08             |
| 1005        | caocao        | 08             |
| 1006        | zhouyu        | 08             |
+-------------+---------------+----------------+--+

方式三:创建文件夹后load数据到分区:

1,现在HDFS创建08分区,并上传数据
> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=07

2,此时该表中,没有对应的元数据信息,也没有对应的数据,所以该表还是空的
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '07';
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+

此时,我们在上传数据的同时,为该表创建07分区的元数据
> load data local inpath '/opt/module/hive/stu.txt' into table stu_par partition(month = '07');
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '07';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 07             |
| 1002        | liubei        | 07             |
| 1003        | guanyu        | 07             |
| 1004        | zhaoyun       | 07             |
| 1005        | caocao        | 07             |
| 1006        | zhouyu        | 07             |
+-------------+---------------+----------------+--+
既把数据上传到指定的分区对应的文件夹下,又能写入元数据。

总结:

让分区表和数据产生关联的方式有三种:

①先在HDFS上创建分区的目录,并上传数据到该目录,最后在使用alter table add partition

②先在HDFS上创建分区的目录,并上传数据到该目录,最后使用msck repair table table_name

③先在HDFS上创建分区的目录,在使用 load data local 路径 into table table_name partition(字段名=字段值) 

最后

以上就是英俊老虎为你收集整理的让分区表和数据产生关联的三种方式的全部内容,希望文章能够帮你解决让分区表和数据产生关联的三种方式所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(75)

评论列表共有 0 条评论

立即
投稿
返回
顶部