概述
目录
- 1、开窗函数的格式
- 2、窗口范围图例
- 3、连续登录问题(补充):
- 3.1:数据准备
- 3.2:代码实现
- 3.3:总结
1、开窗函数的格式
FUNCTION_NAME([argument_list])
OVER (
[PARTITION BY window_partition,…]
[ORDER BY window_ordering, … [ASC|DESC]])
[ { ROWS | RANGE } BETWEEN frame_start AND frame_end ] );
FUNCTION_NAME:函数名称。如row_number()、sum()、first_value()等。
argument_list:函数的参数列表。
PARTITION BY:根据window_partition(分区字段)进行分区,该子句也被称为查询分区子句。类似于group by,都是将数据按照边界值进行分组。而OVER之前的函数在每一个分组之内进行,如果超出了分组,则函数会重新计算。
ORDER BY:将各个分区内的数据,根据window_ordering(排序字段)进行排序。ORDER BY子句会对输入的数据强制排序(窗口函数是SQL语句最后执行的函数,因此可以把SQL结果集想象成输入数据)。ORDER BY子句对于诸如row_number(),lead(),lag()等函数是必须的。如果数据无序,这些函数的结果就没有意义。
ROWS和RANGE分别表示选择前后几行、选择数据范围。
2、窗口范围图例
注释:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点
UNBOUNDED PRECEDING 表示从前面的起点
UNBOUNDED FOLLOWING:表示到后面的终点
常用的范围:
01:ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
02:ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
03:ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING,n PRECEDING m FOLLOWING:表示窗口的范围是[(当前行的行数)- n, (当前行的行数)+ m] row。
04:ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
3、连续登录问题(补充):
3.1:数据准备
要求:登录日志,计算每个人连续登录的最大天数!(注意:断一天还算连续)
id | dt |
---|---|
1001 | 2021-08-01 |
1001 | 2021-08-02 |
1001 | 2021-08-03 |
1001 | 2021-08-05 |
1001 | 2021-08-06 |
1001 | 2021-08-07 |
1001 | 2021-08-10 |
1001 | 2021-08-12 |
1002 | 2021-08-01 |
1002 | 2021-08-02 |
1002 | 2021-08-03 |
1002 | 2021-08-07 |
1002 | 2021-08-09 |
1002 | 2021-08-11 |
1002 | 2021-08-13 |
1002 | 2021-08-15 |
3.2:代码实现
with tx as (
select '1001'as id ,'2021-08-01' as dt
union all
select '1001'as id ,'2021-08-02' as dt
union all
select '1001'as id ,'2021-08-03' as dt
union all
select '1001'as id ,'2021-08-05' as dt
union all
select '1001'as id ,'2021-08-06' as dt
union all
select '1001'as id ,'2021-08-07' as dt
union all
select '1001'as id ,'2021-08-10' as dt
union all
select '1001'as id ,'2021-08-12' as dt
union all
select '1001'as id ,'2021-08-01' as dt
union all
select '1002'as id ,'2021-08-01' as dt
union all
select '1002'as id ,'2021-08-01' as dt
union all
select '1002'as id ,'2021-08-02' as dt
union all
select '1002'as id ,'2021-08-03' as dt
union all
select '1002'as id ,'2021-08-07' as dt
union all
select '1002'as id ,'2021-08-09' as dt
union all
select '1002'as id ,'2021-08-11' as dt
union all
select '1002'as id ,'2021-08-13' as dt
union all
select '1002'as id ,'2021-08-15' as dt
)
select
id,
max(days) days
from (
select
id,
flag,
datediff(max(dt),min(dt)) +1 as days
from (select
id,
dt,
sum(if(dtDiff>2,1,0)) over(partition by id order by dt) flag
from(select
id,
dt,
datediff(dt,lagDt) dtDiff
from
(select
id,
dt,
lag(dt,1,'1970-01-01')over(partition by id order by dt)lagDt
from
tx)t1)t2)t3)
group by id,flag;
)t4
group by id
----------------
----------------
id days
1001 7
1002 9
3.3:总结
断n天还算连续,只需要修改sum(if(dtDiff>2,1,0)) over(partition by id order by dt) flag 把2替换为n+1即可;
最后
以上就是笑点低抽屉为你收集整理的Hive之连续登录问题(补充)1、开窗函数的格式2、窗口范围图例3、连续登录问题(补充):的全部内容,希望文章能够帮你解决Hive之连续登录问题(补充)1、开窗函数的格式2、窗口范围图例3、连续登录问题(补充):所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复