我是靠谱客的博主 凶狠钻石,这篇文章主要介绍hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order...,现在分享给大家,希望可以做个参考。

方案一:请参考《数据库“行专列”操作---使用row_number()over(partition by 分组字段 [order by 排序字段])》,该方案是sqlserver,oracle,mysql,hive均适用的。

在hive中的方案分为以下两种方案:

创建测试表,并插入测试数据:

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
--hive 测试 行转列 collect_set collect_list create table tommyduan_test( gridid string, height int, cell string, mrcount int, weakmrcount int ); insert into tommyduan_test values('g1',1,'cell1',12,3); insert into tommyduan_test values('g1',1,'cell2',22,3); insert into tommyduan_test values('g1',1,'cell3',23,3); insert into tommyduan_test values('g1',1,'cell4',1,3); insert into tommyduan_test values('g1',1,'cell5',3,3); insert into tommyduan_test values('g1',1,'cell6',4,3); insert into tommyduan_test values('g1',1,'cell19',21,3); insert into tommyduan_test values('g2',1,'cell4',1,3); insert into tommyduan_test values('g2',1,'cell5',3,3); insert into tommyduan_test values('g2',1,'cell6',4,3); insert into tommyduan_test values('g2',1,'cell19',21,3);

方案二:使用collect_set方案

注意:collect_set是一个set集合,不允许重复的记录插入

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
select gridid,height,collect_list(cell) cellArray,collect_list(mrcount) mrcountArray,collect_list(weakmrcount) weakmrcountArray from ( select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn from tommyduan_test group by gridid,height,cell,mrcount,weakmrcount ) t10 where rn<4 group by gridid,height; +---------+---------+-----------------------------+---------------+-------------------+--+ | gridid | height | cellarray | mrcountarray | weakmrcountarray | +---------+---------+-----------------------------+---------------+-------------------+--+ | g1 | 1 | ["cell3","cell2","cell19"] | [23,22,21] | [3,3,3] | | g2 | 1 | ["cell19","cell6","cell5"] | [21,4,3] | [3,3,3] | +---------+---------+-----------------------------+---------------+-------------------+--+ select gridid,height, (case when size(cellArray)>0 then cellArray[0] else '-9999' end) as cell1, (case when size(cellArray)>0 then mrcountArray[0] else '-9999' end) as cell1_mrcount, (case when size(cellArray)>0 then weakmrcountArray[0] else '-9999' end) as cell1_weakmrcount, (case when size(cellArray)>1 then cellArray[1] else '-9999' end) as cell2, (case when size(cellArray)>1 then mrcountArray[1] else '-9999' end) as cell2_mrcount, (case when size(cellArray)>1 then weakmrcountArray[1] else '-9999' end) as cell2_weakmrcount, (case when size(cellArray)>2 then cellArray[2] else '-9999' end) as cell3, (case when size(cellArray)>2 then mrcountArray[2] else '-9999' end) as cell3_mrcount, (case when size(cellArray)>2 then weakmrcountArray[2] else '-9999' end) as cell3_weakmrcount from ( select gridid,height,collect_list(cell) cellArray,collect_list(mrcount) mrcountArray,collect_list(weakmrcount) weakmrcountArray from ( select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn from tommyduan_test group by gridid,height,cell,mrcount,weakmrcount ) t10 where rn<4 group by gridid,height ) t12; +---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+ | gridid | height | cell1 | cell1_mrcount | cell1_weakmrcount | cell2 | cell2_mrcount | cell2_weakmrcount | cell3 | cell3_mrcount | cell3_weakmrcount | +---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+ | g1 | 1 | cell3 | 23 | 3 | cell2 | 22 | 3 | cell19 | 21 | 3 | | g2 | 1 | cell19 | 21 | 3 | cell6 | 4 | 3 | cell5 | 3 | 3 | +---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+

 

方案三:使用collect_list/collect_all方案

注意:collect_set是一个set集合,不允许重复的记录插入

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
select gridid,height,collect_set(cell),collect_set(mrcount),collect_set(weakmrcount) from (select * from tommyduan_test order by gridid,height,mrcount desc) t10 group by gridid,height; +---------+---------+-------------------------------------------------------------+----------------------+------+--+ | gridid | height | _c2 | _c3 | _c4 | +---------+---------+-------------------------------------------------------------+----------------------+------+--+ | g1 | 1 | ["cell3","cell2","cell19","cell1","cell6","cell5","cell4"] | [23,22,21,12,4,3,1] | [3] | | g2 | 1 | ["cell19","cell6","cell5","cell4"] | [21,4,3,1] | [3] | +---------+---------+-------------------------------------------------------------+----------------------+------+--+ select gridid,height,collect_set(cell) cellArray,collect_set(mrcount) mrcountArray,collect_set(weakmrcount) weakmrcountArray from ( select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn from tommyduan_test group by gridid,height,cell,mrcount,weakmrcount ) t10 where rn<4 group by gridid,height; +---------+---------+-----------------------------+---------------+-------------------+--+ | gridid | height | cellarray | mrcountarray | weakmrcountarray | +---------+---------+-----------------------------+---------------+-------------------+--+ | g1 | 1 | ["cell3","cell2","cell19"] | [23,22,21] | [3] | | g2 | 1 | ["cell19","cell6","cell5"] | [21,4,3] | [3] | +---------+---------+-----------------------------+---------------+-------------------+--+ select gridid,height,collect_set(concat_ws(',',cell,cast(mrcount as string), cast(weakmrcount as string))) as cellArray from ( select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn from tommyduan_test group by gridid,height,cell,mrcount,weakmrcount ) t10 where rn<4 group by gridid,height +---------+---------+--------------------------------------------+--+ | gridid | height | cellarray | +---------+---------+--------------------------------------------+--+ | g1 | 1 | ["cell3,23,3","cell2,22,3","cell19,21,3"] | | g2 | 1 | ["cell19,21,3","cell6,4,3","cell5,3,3"] | +---------+---------+--------------------------------------------+--+ select gridid,height, (case when size(cellArray)>0 then split(cellArray[0],'_')[0] else '-9999' end) as cell1, (case when size(cellArray)>0 then split(cellArray[0],'_')[1] else '-9999' end) as cell1_mrcount, (case when size(cellArray)>0 then split(cellArray[0],'_')[2] else '-9999' end) as cell1_weakmrcount, (case when size(cellArray)>1 then split(cellArray[1],'_')[0] else '-9999' end) as cell2, (case when size(cellArray)>1 then split(cellArray[1],'_')[1] else '-9999' end) as cell2_mrcount, (case when size(cellArray)>1 then split(cellArray[1],'_')[2] else '-9999' end) as cell2_weakmrcount, (case when size(cellArray)>2 then split(cellArray[2],'_')[0] else '-9999' end) as cell3, (case when size(cellArray)>2 then split(cellArray[2],'_')[1] else '-9999' end) as cell3_mrcount, (case when size(cellArray)>2 then split(cellArray[2],'_')[2] else '-9999' end) as cell3_weakmrcount from ( select gridid,height,collect_set(concat_ws('_',cell,cast(mrcount as string), cast(weakmrcount as string))) as cellArray from ( select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn from tommyduan_test group by gridid,height,cell,mrcount,weakmrcount ) t10 where rn<4 group by gridid,height ) t12; +---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+ | gridid | height | cell1 | cell1_mrcount | cell1_weakmrcount | cell2 | cell2_mrcount | cell2_weakmrcount | cell3 | cell3_mrcount | cell3_weakmrcount | +---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+ | g1 | 1 | cell3 | 23 | 3 | cell2 | 22 | 3 | cell19 | 21 | 3 | | g2 | 1 | cell19 | 21 | 3 | cell6 | 4 | 3 | cell5 | 3 | 3 | +---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+

 

最后

以上就是凶狠钻石最近收集整理的关于hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order...的全部内容,更多相关hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(93)

评论列表共有 0 条评论

立即
投稿
返回
顶部