我是靠谱客的博主 糟糕电话,最近开发中收集的这篇文章主要介绍hiveSql 实现计算用户的最大连续登录天数 & 创建断连后的连续序号1.需求一:2.需求二,觉得挺不错的,现在分享给大家,希望可以做个参考。
概述
hiveSql 实现计算用户的最大连续登录天数 & 创建断连后的连续序号
- 1.需求一:
- 1.1.已有数据
- 1.2.期望结果
- 1.3.分析
- 1.4.实现
- 2.需求二
- 2.1.已有数据
- 2.2.期望结果
- 2.3.分析
- 2.4.实现方式一
- 2.5.实现方式二
1.需求一:
计算每个用户在一段时间内的最大连续登录天数
1.1.已有数据
username | logintime |
---|---|
A | 2021-10-01 |
A | 2021-10-02 |
A | 2021-10-03 |
A | 2021-10-05 |
A | 2021-10-06 |
A | 2021-10-09 |
B | 2021-10-01 |
B | 2021-10-02 |
B | 2021-10-03 |
B | 2021-10-04 |
1.2.期望结果
username | max_login_days |
---|---|
A | 3 |
B | 4 |
1.3.分析
1.连续登陆,就是在连续登陆的期间内,后一天和前一天的差值为1,不能为大于1的值,直到间断。可以设置一列序号,如果是连续的话,这列序号也是会随着日期同步增长的,那么日期减去这个序号,应该都是一个确定的日期。
1.4.实现
select
username,max(max_login_days) as max_login_days
from
(select
username,diff_day,count(1) as max_login_days
from
(select
username,login_time
,substr(DATEADD(to_date(login_time,'yyyy-mm-dd'),-irank,'dd'),1,10) as diff_day -- 将用户的登录日期与排序序号做差计算一个日期
from
(select
username,login_time
,ROW_NUMBER() OVER(PARTITION BY username ORDER BY login_time) AS irank
from tb_user_login
) t1
) t2
group by username,diff_day
) t3
group by username;
2.需求二
用户浏览产品线连续排序序号,若中间插入其他产品线再出现原产品线则从1开始再计数
2.1.已有数据
type | hour | username |
---|---|---|
ask | 9 | liu |
ask | 10 | liu |
blog | 11 | liu |
d | 12 | liu |
d | 13 | liu |
d | 14 | liu |
ask | 15 | liu |
d | 16 | liu |
2.2.期望结果
type | hour | username | irank |
---|---|---|---|
ask | 9 | liu | 1 |
ask | 10 | liu | 2 |
blog | 11 | liu | 1 |
d | 12 | liu | 1 |
d | 13 | liu | 2 |
d | 14 | liu | 3 |
ask | 15 | liu | 1 |
d | 16 | liu | 1 |
2.3.分析
这里只想了仅一个用户情况,可以看到期望结果中该用户在9点到16点分别浏览了多个产品线,按照浏览时间排序连续产品线的序号期望是连续的,被别的产品线打断后再出现则再次从1开始计数。逆向思维想,如果能完成上述排序,则在按照hour排序的前提下,相邻的相同的产品线type能有个相同的值。如果有这样的一个列irank_tmp,再将type和该列一起row_number开窗即可。
例如:
type | hour | username | irank_tmp |
---|---|---|---|
ask | 9 | liu | 0 |
ask | 10 | liu | 0 |
blog | 11 | liu | 1 |
d | 12 | liu | 2 |
d | 13 | liu | 2 |
d | 14 | liu | 2 |
ask | 15 | liu | 3 |
d | 16 | liu | 4 |
2.4.实现方式一
with tmp as (
select
'ask'as type,9 as pt ,'liu'as user
union all
select
'ask'as type,10 as pt ,'liu'as user
union all
select
'blog'as type,11 as pt ,'liu'as user
union all
select
'd'as type,12 as pt ,'liu'as user
union all
select
'd'as type,13 as pt ,'liu'as user
union all
select
'd'as type,14 as pt ,'liu'as user
union all
select
'ask'as type,15 as pt ,'liu'as user
union all
select
'd'as type,16 as pt ,'liu'as user
)
,
tmp1 as
(
select
type,pt,user
,ROW_NUMBER() OVER(PARTITION BY user ORDER BY pt) AS irank
from tmp
)
select
type,pt,user,irank,
ROW_NUMBER() OVER(PARTITION BY type,flag ORDER BY flag) AS con_rank
from(
select
a.type,a.pt,a.user,a.irank, case when a.irank is not null and b.irank is null and c.irank is null then a.irank else 0 end as flag
from tmp1 a
left join tmp1 b
on a.irank = b.irank + 1 and a.type = b.type
left join tmp1 c
on a.irank = c.irank - 1 and a.type = c.type
) t;
方式一的做法比较麻烦,思路是将数据按照用户的浏览时间排序后记录序号,错位自连接两次,一次往上错位,一次往下错位。用前三行示例:
- 按照时间排序后是
type | hour | username | irank_tmp |
---|---|---|---|
ask | 9 | liu | 1 |
ask | 10 | liu | 2 |
blog | 11 | liu | 3 |
- 错位两次后
type | hour | username | a.irank_tmp | b.irank1 | c.irank2 |
---|---|---|---|---|---|
ask | 9 | liu | 1 | N | 2 |
ask | 10 | liu | 2 | 1 | N |
blog | 11 | liu | 3 | N | N |
- 分析错位数据
只有是没有连续产品线的数据的b.irank1 和 c.irank2是空值,a.irank_tmp不为空。此时只用将这种情况的数赋为原来的时间排序的序号即irank_tmp,其他都为0即可。可得到:
type | hour | username | irank_tmp | irank |
---|---|---|---|---|
ask | 9 | liu | 1 | 0 |
ask | 10 | liu | 2 | 0 |
blog | 11 | liu | 3 | 3 |
d | 12 | liu | 4 | 0 |
d | 13 | liu | 5 | 0 |
d | 14 | liu | 6 | 0 |
ask | 15 | liu | 7 | 7 |
d | 16 | liu | 8 | 8 |
- 此时只要按照type和irank分组做开窗row_number即可获得预期数据形式
2.5.实现方式二
with tmp as (
select
'ask'as type,9 as pt ,'liu'as user
union all
select
'ask'as type,10 as pt ,'liu'as user
union all
select
'blog'as type,11 as pt ,'liu'as user
union all
select
'd'as type,12 as pt ,'liu'as user
union all
select
'd'as type,13 as pt ,'liu'as user
union all
select
'd'as type,14 as pt ,'liu'as user
union all
select
'ask'as type,15 as pt ,'liu'as user
union all
select
'd'as type,16 as pt ,'liu'as user
)
,
tmp1 as
(
select
type,pt,user
,ROW_NUMBER() OVER(PARTITION BY user ORDER BY pt) AS irank
from tmp
)
select
type,pt,user,
ROW_NUMBER() OVER(PARTITION BY type,irank1-irank2 ORDER BY irank1-irank2) AS irank
from(
select
type,pt,user
,ROW_NUMBER() OVER(PARTITION BY user ORDER BY pt) AS irank1
,ROW_NUMBER() OVER(PARTITION BY user,type ORDER BY pt) AS irank2
from tmp
) t;
方式二的做法比较灵巧,思路是将数据按照用户的浏览时间排序后记录序号,再将数据按照type和时间做升序排序记录序号,再将这两个序号做差,即可得到步骤1.3分析中的irank_tmp。示例:
type | hour | username | irank_pt | irank_type | irank_pt - irank_type |
---|---|---|---|---|---|
ask | 9 | liu | 1 | 1 | 0 |
ask | 10 | liu | 2 | 2 | 0 |
blog | 11 | liu | 3 | 1 | 2 |
d | 12 | liu | 4 | 1 | 3 |
d | 13 | liu | 5 | 2 | 3 |
d | 14 | liu | 6 | 3 | 3 |
ask | 15 | liu | 7 | 3 | 4 |
d | 16 | liu | 8 | 4 | 4 |
此时还是将type与这个差值分组做row_number即可得到期望结果
最后
以上就是糟糕电话为你收集整理的hiveSql 实现计算用户的最大连续登录天数 & 创建断连后的连续序号1.需求一:2.需求二的全部内容,希望文章能够帮你解决hiveSql 实现计算用户的最大连续登录天数 & 创建断连后的连续序号1.需求一:2.需求二所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复