概述
目录
基本思路
案例展示
数据准备
需求实现
步骤1:
步骤2:
步骤3:
将步骤1、步骤2、步骤3合并为一整条SQL
基本思路
步骤1:首先拿到原始数据,并对uid和dt分组,顺带求出每个uid在每天登录的总次数并保留,并使每个uid下的dt保持顺序排序并留下排序序号。
步骤2:在步骤1的基础上用dt减去排序序号的天数得到一个时间(sub_dt),因为我们已经按照顺序排序了所以只要得到的时间相同,则表示这两个时间是连续的。接下来只需要对uid和sub_dt进行分组,求count,求sum就可以得到连续登录天数,以及每次连续的登录次数。
步骤3:在步骤2的基础上在去取出最大的连续天数以及最大的连续登录次数即可。
案例展示
数据准备
CREATE TABLE IF NOT EXISTS tmp.tmp_max_continuous_data_prepare(
uid String COMMENT '用户ID',
dt String COMMENT '登录时间'
) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't';
vi /home/jerevon/max_continuous_data.txt
1 2021-07-01
1 2021-07-02
1 2021-07-02
1 2021-07-03
1 2021-07-05
1 2021-07-06
2 2021-07-01
2 2021-07-02
2 2021-07-04
2 2021-07-04
2 2021-07-04
2 2021-07-05
3 2021-07-02
3 2021-07-04
3 2021-07-04
3 2021-07-06
LOAD DATA LOCAL INPATH '/home/jerevon/max_continuous_data.txt' INTO TABLE tmp.tmp_max_continuous_data_prepare;
得到如下数据
需求实现
步骤1:
-- 对uid和dt分组并求出每个用户在每天登录的总次数并保留,将每个uid下的dt按照递增顺序排名。
CREATE TABLE IF NOT EXISTS tmp.tmp_max_continuous_data_prepare_step1 AS
SELECT uid, dt, times,
row_number() over(PARTITION BY uid ORDER BY dt) AS row_num
FROM
(SELECT uid, dt, count(dt) AS times
FROM tmp.tmp_max_continuous_data_prepare
GROUP BY uid, dt) AS times_table;
得到步骤1的结果数据
- uid:用户id
- dt:登录时间
- times:登录时间当天登录次数
- row_num:按登录时间顺序排序序号
步骤2:
-- 在步骤1的基础上,用dt前推row_num天,得到的结果按照uid,sub_date分组并求出count-所有连续天数和sum-所有连续天数中的登录总次数
CREATE TABLE IF NOT EXISTS tmp.tmp_max_continuous_data_prepare_step2 AS
SELECT uid, count(dt) AS continuous_days, sum(times) AS continuous_times
FROM (SELECT UID, dt, times, date_sub(dt,row_num) AS sub_date
FROM tmp.tmp_max_continuous_data_prepare_step1) AS sub_table
GROUP BY uid, sub_date;
得到步骤2的结果数据
- uid:用户id
- continuous_days:连续登录天数
- continuous_times:本次连续登录天数中用户的登录次数
步骤3:
-- 在步骤2的基础上,进行开窗选取continuous_days最大,而且continuous_times最大的情况,作为最终结果。
CREATE TABLE IF NOT EXISTS tmp.tmp_max_continuous_data_prepare_step3 AS
SELECT uid, continuous_days AS max_continuous_days, continuous_times AS max_continuous_times
FROM (SELECT uid, continuous_days, continuous_times, row_number() over(PARTITION BY uid ORDER BY continuous_days DESC,continuous_times DESC) AS row_num
FROM tmp.tmp_max_continuous_data_prepare_step2) AS max_table
WHERE row_num=1;
得到步骤3的结果数据
- uid:用户id
- max_continuous_days:最大连续登录天数
- max_continuous_times:最大连续登录天数中的最大登录次数
将步骤1、步骤2、步骤3合并为一整条SQL
SELECT uid, continuous_days, continuous_times
FROM
(SELECT uid, continuous_days, continuous_times,
row_number() over(PARTITION BY uid ORDER BY continuous_days DESC,continuous_times DESC) AS row_num
FROM
(SELECT uid, count(dt) AS continuous_days, sum(times) AS continuous_times
FROM
(SELECT uid, dt, times, date_sub(dt,row_num) AS sub_date
FROM
(SELECT uid, dt, times,
row_number() over(PARTITION BY uid ORDER BY dt) AS row_num
FROM
(SELECT uid, dt, count(dt) AS times
FROM tmp.tmp_max_continuous_data_prepare
GROUP BY uid, dt) AS times_table) AS order_table) AS sub_table
GROUP BY uid, sub_date) AS result_table) AS max_table
WHERE row_num=1;
延伸:以上情况是在最大连续天数相同的情况取最大的登录次数,如果是要在最大连续天数相同的时候取最近一次连续的登录次数或者跟时间相关的指标的时候我们就需要从步骤2开始就把dt关键字带出来,最终的代码实现如下。
SELECT uid, continuous_days AS max_continuous_days, continuous_times AS max_continuous_times
FROM
(SELECT uid, dt, continuous_days, continuous_times,
row_number() over(PARTITION BY uid ORDER BY continuous_days DESC, dt desc) AS row_num
FROM
(SELECT uid, dt,
count(dt) over(PARTITION BY uid, sub_date) AS continuous_days,
sum(times) over(PARTITION BY uid, sub_date) AS continuous_times
FROM
(SELECT uid, dt, times, date_sub(dt,row_num) AS sub_date
FROM
(SELECT uid, dt, times, row_number() over(PARTITION BY uid ORDER BY dt) AS row_num
FROM
(SELECT uid, dt, count(dt) AS times
FROM tmp.tmp_max_continuous_data_prepare
GROUP BY uid, dt) AS times_table) AS order_table) AS sub_table) AS recently_table) AS final_table
WHERE row_num=1;
最后
以上就是俊逸红酒为你收集整理的Hive-最大连续登录天数、最大连续登录天数的最大登录总次数基本思路案例展示的全部内容,希望文章能够帮你解决Hive-最大连续登录天数、最大连续登录天数的最大登录总次数基本思路案例展示所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复