概述
概述
本篇主要是记录impala hive sql 使用中的常见查询与分析函数
数据引用地址 博客引用地址
数据样本
cookie1 2015-04-10 1
cookie1 2015-04-11 5
cookie1 2015-04-12 7
cookie1 2015-04-13 3
cookie1 2015-04-14 2
cookie1 2015-04-15 4
cookie1 2015-04-16 4
cookie2 2015-04-10 2
cookie2 2015-04-11 3
cookie2 2015-04-12 5
cookie2 2015-04-13 6
cookie2 2015-04-14 3
cookie2 2015-04-15 9
cookie2 2015-04-16 7
分析窗口函数应用场景用于分区排序
动态Group By
Top N
累计计算
层次查询
ROW_NUMBER() OVER函数的基本用法用法
row_number() OVER (PARTITION BY COL1 ORDER BY COL2) 表示根据COL1分组,在分组内部根据 COL2排序,而此函数计算的值就表示每组内部排序后的顺序编号(组内连续的唯一的)
SELECT * FROM (
SELECT partner_hotel_id, hotel_seq, task_date, link_status, row_number()
OVER(PARTITION BY partner_hotel_id ORDER BY task_date DESC) AS num
FROM clues_mapping.qunar_mapping_houses
) tmp WHERE tmp.num = 1 and link_status = 'LINKED'
NTILE() OVER
NTILE(n),用于将分组数据按照顺序切分成n片,返回当前切片值, 假如统计一个cookie,pv数最多的前1/3的天,下边的数据其实我们只需rn2 = 1,如果是前2/3那rn2 in (1, 2)
SELECT
cookieid,
createtime,
pv,
NTILE(2) OVER(PARTITION BY cookieid ORDER BY pv) AS rn1, --分组内将数据分成2片
NTILE(3) OVER(PARTITION BY cookieid ORDER BY pv) AS rn2, --分组内将数据分成3片
NTILE(4) OVER(ORDER BY pv) AS rn3 --将所有数据分成4片
FROM lxw1234
ORDER BY cookieid,createtime;
cookieid day pv rn1 rn2 rn3
-------------------------------------------------
cookie1 2015-04-10 1 1 1 1
cookie1 2015-04-11 5 1 1 1
cookie1 2015-04-12 7 1 1 2
cookie1 2015-04-13 3 1 2 2
cookie1 2015-04-14 2 2 2 3
cookie1 2015-04-15 4 2 3 3
cookie1 2015-04-16 4 2 3 4
cookie2 2015-04-10 2 1 1 1
cookie2 2015-04-11 3 1 1 1
cookie2 2015-04-12 5 1 1 2
cookie2 2015-04-13 6 1 2 2
cookie2 2015-04-14 3 2 2 3
cookie2 2015-04-15 9 2 3 4
cookie2 2015-04-16 7 2 3 4
RANK 和 DENSE_RANK
—RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位 —DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位
SELECT
cookieid,
createtime,
pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3
FROM lxw1234
WHERE cookieid = 'cookie1';
cookieid day pv rn1 rn2 rn3
--------------------------------------------------
cookie1 2015-04-12 7 1 1 1
cookie1 2015-04-11 5 2 2 2
cookie1 2015-04-15 4 3 3 3
cookie1 2015-04-16 4 3 3 4
cookie1 2015-04-13 3 5 4 5
cookie1 2015-04-14 2 6 5 6
cookie1 2015-04-10 1 7 6 7
rn1: 15号和16号并列第3, 13号排第5
rn2: 15号和16号并列第3, 13号排第4
rn3: 如果相等,则按记录值排序,生成唯一的次序,如果所有记录值都相等,或许会随机排吧。
窗口统计函数
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句: PRECEDING:往前 FOLLOWING:往后 CURRENT ROW:当前行 UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点
表结构 cookieid STRING createtime STRING * pv INT
sum 窗口统计的使用,AVG,MIN,MAX均是同样的用法
SELECT cookieid,
createtime,
pv,
SUM(pv) OVER(PARTITION BY cookieid
ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
SUM(pv) OVER(PARTITION BY cookieid
ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
SUM(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
SUM(pv) OVER(PARTITION BY cookieid
ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
SUM(pv) OVER(PARTITION BY cookieid
ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
SUM(pv) OVER(PARTITION BY cookieid
ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
FROM lxw1234;
hive中的行列转换函数
hive 中的行转列
关键点:使用到explode()、split()和LATERAL VIEW
SELECT rateable_id AS house_id,
ids
FROM vdayu_production.rates AS h lateral view explode(split(sight_photo_ids_str, ",")) adtable AS ids
hive 中的列转行
关键点:使用到concat_ws()和collect_set(),collect_list() collect_set()和collect_list() 的区别:collect_set()会对合并后的数据去重而 collect_list()不会去重
SELECT sale_ord_id ,
concat_ws(',',collect_set(item_sku_id))as item_sku_id
FROM table1
GROUP BY sale_ord_id
impala 时间戳转时间
from_unixtime(cast(substr(created_time,1, 10) as int), 'yyyy-MM-dd')
hive 中的json串解析
get_json_object(string json_string, string path)get_json_object(regexp_extract(houses.accept_time,'^[(.+)]$', 1), '$.start_accept_time') AS start_accept_time,
这里是为了处理类似于这样的数据
[{"start_accept_time":"08:00","end_accept_time":"24:00"}]
先去除 [] 然后拿相应的字段
hive 中用sort_array函数解决collet_list,collet_set列表排序混乱问题
SELECT
memberid,
regexp_replace(
concat_ws('-',
sort_array(
collect_set(
concat_ws(':',cast(legcount as string),airways)
)
)
),'\d:','') hs
from
(
select 1 as memberid,'A' as airways,2 as legcount
union ALL
select 1 as memberid,'B' as airways,3 as legcount
union ALL
select 2 as memberid,'C' as airways,4 as legcount
union ALL
select 2 as memberid,'D' as airways,1 as legcount
union ALL
select 2 as memberid,'D' as airways,1 as legcount
) as t
group by memberid
hive中时间转换
from_unixtime(unix_timestamp('20190525', 'yyyyMMdd'), 'yyyy-MM-dd')
SELECT
from_unixtime( to_unixtime(cast ('2017-12-01' as timestamp))
-- format_datetime(from_unixtime(cast(to_unixtime(cast ('2017-12-01' as timestamp) as double)),'yyyy-MM-dd HH:mm:ss')
hive中的透视表实现
有时候,我们在hive中编写hql的时候,可以用键-值对的形式存储结果。比如有一些同学在一些课程中的学习行为,我们可以记录成如下形式:
这样做的好处有以下几点。一是可以避免hql中出现大量的join连接,使得hql的冗长;二是便于修改表,比如说,在有个同学的学习特征提取错误的时候,可以直接删除这个key,而不用再重写修改hql,重新跑hql。在将数据存储成这种形式后,我们需要将其转换成普通的形式。本文主要记录如何通过hive中的内置函数,来进行数据的转换。
create table tmp as
select 'A' as student, 'english' as class_name, 'answer_num' as key, 9 as value from dual
union all
select 'A' as student, 'chinese' as class_name, 'answer_num' as key, 15 as value from dual
union all
select 'B' as student, 'english' as class_name, 'answer_num' as key, 12 as value from dual
union all
select 'A' as student, 'english' as class_name, 'homework_num' as key, 5 as value from dual
union all
select 'B' as student, 'english' as class_name, 'homework_num' as key, 7 as value from dual
union all
select 'B' as student, 'chinese' as class_name, 'homework_num' as key, 10 as value from dual;
---------------------
作者:SK_Lavender
来源:CSDN
原文:https://blog.csdn.net/u010705209/article/details/53164118
Pivot using Hivemall to_map function
SELECT
uid,
kv['c1'] AS c1,
kv['c2'] AS c2,
kv['c3'] AS c3
FROM (
SELECT uid, to_map(key, value) kv
FROM vtable
GROUP BY uid
) t
Unpivot
SELECT t1.uid, t2.key, t2.value
FROM htable t1
LATERAL VIEW explode (map(
'c1', c1,
'c2', c2,
'c3', c3
)) t2 as key, value
最后
以上就是洁净时光为你收集整理的商品SKU mySQL 列转行_HIVE SQL奇技淫巧的全部内容,希望文章能够帮你解决商品SKU mySQL 列转行_HIVE SQL奇技淫巧所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复