Hive通过动态分区装载数据

225 阅读 0 评论 149 点赞

我是靠谱客的博主冷静冥王星，这篇文章主要介绍Hive通过动态分区装载数据，现在分享给大家，希望可以做个参考。

Hive通过动态分区装载数据

0、启动hive命令行窗口

1、创建分区表t12并查看分区hive表信息

hive> CREATE TABLE t12(id INT , NAME string) partitioned BY(YEAR INT , MONTH INT) ROW format delimited FIELDS TERMINATED BY 't';
OK
Time taken: 0.21 seconds
hive> desc t12;
OK
id
int
name
string
year
int
month
int
# Partition Information
# col_name
data_type
comment
year
int
month
int
Time taken: 0.085 seconds, Fetched: 10 row(s)

2、创建测试数据文件，并装载数据到分区表t12

localhost:result_data a6$ pwd
/Users/a6/Applications/apache-hive-2.3.0-bin/result_data
localhost:result_data a6$ more t12_data.txt
1
小华
2
成龙
3
zhangsan
4
李四
5
张龙
6
赵虎
localhost:result_data a6$

装载数据并查看装载后的数据，命令如下：

hive> load data local inpath '/Users/a6/Applications/apache-hive-2.3.0-bin/result_data/t12_data.txt'Display all 574 possibilities? (y or n)
hive> load data local inpath '/Users/a6/Applications/apache-hive-2.3.0-bin/result_data/t12_data.txt' into table t12 partition ( year=2017,month=8);
Loading data to table yyz_workdb.t12 partition (year=2017, month=8)
OK
Time taken: 1.042 seconds
hive> load data local inpath '/Users/a6/Applications/apache-hive-2.3.0-bin/result_data/t12_data.txt' into table t12 partition ( year=2017,month=9);
Loading data to table yyz_workdb.t12 partition (year=2017, month=9)
OK
Time taken: 0.575 seconds
hive> load data local inpath '/Users/a6/Applications/apache-hive-2.3.0-bin/result_data/t12_data.txt' into table t12 partition ( year=2017,month=10);
Loading data to table yyz_workdb.t12 partition (year=2017, month=10)
OK
Time taken: 0.532 seconds
hive> load data local inpath '/Users/a6/Applications/apache-hive-2.3.0-bin/result_data/t12_data.txt' into table t12 partition ( year=2017,month=11);
Loading data to table yyz_workdb.t12 partition (year=2017, month=11)
OK
Time taken: 0.502 seconds
hive> select * from t12;
OK
1	小华	2017	10
2	成龙	2017	10
3	zhangsan	2017	10
4	李四	2017	10
5	张龙	2017	10
6	赵虎	2017	10
1	小华	2017	11
2	成龙	2017	11
3	zhangsan	2017	11
4	李四	2017	11
5	张龙	2017	11
6	赵虎	2017	11
1	小华	2017	8
2	成龙	2017	8
3	zhangsan	2017	8
4	李四	2017	8
5	张龙	2017	8
6	赵虎	2017	8
1	小华	2017	9
2	成龙	2017	9
3	zhangsan	2017	9
4	李四	2017	9
5	张龙	2017	9
6	赵虎	2017	9
Time taken: 0.193 seconds, Fetched: 24 row(s)

3、创建分区表t13

hive> CREATE TABLE t13(id INT , NAME string) partitioned BY(YEAR INT , MONTH INT) ROW format delimited FIELDS TERMINATED BY 't';
OK
Time taken: 0.075 seconds

4、动态加载数据到分区表

hive> insert into table t13 partition(year=2015,month) select id,name,month from t12 where year=2017;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = a6_20171104164639_f6406180-46ea-496b-8cf6-70f28ca62659
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1509763925736_0003, Tracking URL = http://localhost:8088/proxy/application_1509763925736_0003/
Kill Command = /Users/a6/Applications/hadoop-2.6.5/bin/hadoop job
-kill job_1509763925736_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-11-04 16:46:46,292 Stage-1 map = 0%,
reduce = 0%
2017-11-04 16:46:52,648 Stage-1 map = 100%,
reduce = 0%
Ended Job = job_1509763925736_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/t13/year=2015/.hive-staging_hive_2017-11-04_16-46-39_096_8917617436179357709-1/-ext-10000
Loading data to table yyz_workdb.t13 partition (year=2015, month=null)
Loaded : 4/4 partitions.
Time taken to load dynamic partitions: 0.476 seconds
Time taken for adding to write entity : 0.001 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1
HDFS Read: 5808 HDFS Write: 467 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 15.51 seconds

注意：执行此语句会把t12表中的year=2017的所有数据插入到新的分区表t13中，(ps. 并指定新的分区字段为year=2015)。注意id,name,month的写法，t13中有id、name、year、month字段，其中year、month为分区字段，插入的时候，因为已经指定year=2017，所以从t12中查询的时候，只需要指定三列id,name,month就行。

4.1、下面查看插入后的t13表中的数据：

hive> select * from t13;
OK
1	小华	2015	10
2	成龙	2015	10
3	zhangsan	2015	10
4	李四	2015	10
5	张龙	2015	10
6	赵虎	2015	10
1	小华	2015	11
2	成龙	2015	11
3	zhangsan	2015	11
4	李四	2015	11
5	张龙	2015	11
6	赵虎	2015	11
1	小华	2015	8
2	成龙	2015	8
3	zhangsan	2015	8
4	李四	2015	8
5	张龙	2015	8
6	赵虎	2015	8
1	小华	2015	9
2	成龙	2015	9
3	zhangsan	2015	9
4	李四	2015	9
5	张龙	2015	9
6	赵虎	2015	9
Time taken: 0.098 seconds, Fetched: 24 row(s)

5、使用全部分区才可以变为动态的

set hive.exec.dynamic.partition.mode=nonstrict; //必须设置，才可以使用全部分区才可以变为动态的

insert into table t13 partition(year,month) select * from t12;

参考网址:http://blog.csdn.net/liubiaoxin/article/details/48931247

最后

以上就是冷静冥王星最近收集整理的关于Hive通过动态分区装载数据的全部内容，更多相关Hive通过动态分区装载数据内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：hive
浏览次数：225 次浏览
发布日期：2023-10-04 01:50:31
本文链接：https://www.kaopuke.com/article/k-p-k_14_uzogf2_14_j_14_y.html

Hive通过动态分区装载数据

最后

评论列表共有 0 条评论

发表评论取消回复

Hive通过动态分区装载数据

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复