GAUSSDB取连续的时间段
数据库中,test表存储两个字段,手机号(phone),时间(data_date)两个字段。其中,每五分钟在表中记录一次。但是时间也有断点的时候,现在取表中手机号连续时间段内的开始时间与结束时间。 表中数据如下:如图所示表中的数据,最终我们需要的结果也是如下图所示:
取连续时间段内的开始时间与结束时间也是想了好久,也是不停的搜参考文章来实现的,下面的sql还是有些不足,算是投机取巧实现的,后续还需优化,但是目前只能到这了,如果大神们有其他的实现方法欢迎评论补充。
我是用临时表来实现的,具体实现方法如下:
---result临时表:根据phone分组对data_date进行排序,取排序后的值
with result as
(select row_number() over(partition by phone order by data_date ) as rn,phone,data_date from test),
---results表:标记每条记录前一条和后一条是否在5分钟之内
results as
(select r2.*,
case when t2.data_date = r2.data_date - interval '5 min' then 1 else null end as flag2
from
(select r1.rn,r1.phone,r1.data_date,
case when t.data_date = r1.data_date + interval '5 min' then 1 else null end as flag1
from
(select rn,phone,data_date from result) r1
left outer join
(select data_date,phone,rn from result) t
on
r1.phone=t.phone
and
r1.rn+1 = t.rn)r2
left outer join
(select data_date,phone,rn from result)t2
on
r2.phone=t2.phone
and
r2.rn -1=t2.rn)
---查询results表
---select * from results;
,
---连续记录的第一条
mins as
(select rn,data_date,phone from results where flag2 is null and flag1 = 1)
---查询mins表
---select * from mins;
,
---取连续记录的最后一条
maxs as (select rn,phone,data_date from results where flag1 is null and flag2 = 1)
---查询maxs表
---select * from maxs;
,
---第一条与最后一条用记录号和手机号关联
finals as
(select m1.rn,m1.phone,m1.data_date,t1.rn2 from mins m1
cross join
(select min(rn) over(partition by phone) as rn2,phone,data_date,rn from maxs)t1
where
t1.phone=m1.phone
and
t1.rn > m1.rn
)
---查询finals表
---select * from finals;
--取最终结果
select phone,start_time,end_time
from
(select m1.phone,m1.data_date start_time,
m2.data_date end_time from finals m1
join maxs m2
on m1.rn2=m2.rn
and m1.phone=m2.phone)t;
但是上边的sql运行出来的数据会造成以下结果:
因为我们在建finals表时,
finals as
(select m1.rn,m1.phone,m1.data_date,t1.rn2 from mins m1
cross join
(select min(rn) over(partition by phone) as rn2,phone,data_date,rn from maxs)t1
where
t1.phone=m1.phone
and
t1.rn > m1.rn
)
判断了maxs的rn 比 mins的rn 大,所以会有多条结果查询出来,因此我做了如下判断,以下判断就是投机取巧的方法了,因为我目前没有其他方法可以去除这样的数据
避免以上问题的产生,优化后的sql。
---result临时表:根据phone分组对data_date进行排序,取排序后的值
with result as
(select row_number() over(partition by phone order by data_date ) as rn,phone,data_date from test),
---results表:标记每条记录前一条和后一条是否在5分钟之内
results as
(select r2.*,
case when t2.data_date = r2.data_date - interval '5 min' then 1 else null end as flag2
from
(select r1.rn,r1.phone,r1.data_date,
case when t.data_date = r1.data_date + interval '5 min' then 1 else null end as flag1
from
(select rn,phone,data_date from result) r1
left outer join
(select data_date,phone,rn from result) t
on
r1.phone=t.phone
and
r1.rn+1 = t.rn)r2
left outer join
(select data_date,phone,rn from result)t2
on
r2.phone=t2.phone
and
r2.rn -1=t2.rn)
---查询results表
---select * from results;
,
---连续记录的第一条
mins as
(select rn,data_date,phone from results where flag2 is null and flag1 = 1)
---查询mins表
---select * from mins;
,
---取连续记录的最后一条
maxs as (select rn,phone,data_date from results where flag1 is null and flag2 = 1)
---查询maxs表
---select * from maxs;
,
---第一条与最后一条用记录号和手机号关联
finals as
(select min(m1.rn) over(partition by m1.phone,m1.data_date),m1.phone,m1.data_date,min(t1.rn2) over(partition by m1.phone,m1.data_date) from mins m1
cross join
(select min(rn) over(partition by phone) as rn2,phone,data_date,rn from maxs)t1
where
t1.phone=m1.phone
and
t1.rn > m1.rn
)
---查询finals表
---select * from finals;
--取最终结果
select phone,start_time,end_time
from
(select m1.phone,m1.data_date start_time,
m2.data_date end_time from finals m1
join maxs m2
on m1.rn2=m2.rn
and m1.phone=m2.phone)t
group by phone,start_time,end_time
;
做完这个sql,我随便取了一条数据,进行测试,发现少考虑一个问题,就是如果只有一个点存在呢,数据如下,只有8:35的时间点,没有连续的:
只有一个点存在的时候,没有连续的值,即flag1和flag2 都为null的时,sql如下:
---result临时表:根据phone分组对data_date进行排序,取排序后的值
with result as
(select row_number() over(partition by phone order by data_date ) as rn,phone,data_date from test),
---results表:标记每条记录前一条和后一条是否在5分钟之内
results as
(select r2.*,
case when t2.data_date = r2.data_date - interval '5 min' then 1 else null end as flag2
from
(select r1.rn,r1.phone,r1.data_date,
case when t.data_date = r1.data_date + interval '5 min' then 1 else null end as flag1
from
(select rn,phone,data_date from result) r1
left outer join
(select data_date,phone,rn from result) t
on
r1.phone=t.phone
and
r1.rn+1 = t.rn)r2
left outer join
(select data_date,phone,rn from result)t2
on
r2.phone=t2.phone
and
r2.rn -1=t2.rn)
---查询results表
---select * from results;
,
--考虑只有一个点的时候
only_data as (select select rn,data_date,phone from results where flag2 is null and flag1 is null)
,
---连续记录的第一条
mins as
(select rn,data_date,phone from results where flag2 is null and flag1 = 1)
---查询mins表
---select * from mins;
,
---取连续记录的最后一条
maxs as (select rn,phone,data_date from results where flag1 is null and flag2 = 1)
---查询maxs表
---select * from maxs;
,
---第一条与最后一条用记录号和手机号关联
finals as
(select min(m1.rn) over(partition by m1.phone,m1.data_date),m1.phone,m1.data_date,min(t1.rn2) over(partition by m1.phone,m1.data_date) from mins m1
cross join
(select min(rn) over(partition by phone) as rn2,phone,data_date,rn from maxs)t1
where
t1.phone=m1.phone
and
t1.rn > m1.rn
)
---查询finals表
---select * from finals;
--取最终结果
select phone,start_time,end_time
from
(select m1.phone,m1.data_date start_time,
m2.data_date end_time from finals m1
join maxs m2
on m1.rn2=m2.rn
and m1.phone=m2.phone)t
group by phone,start_time,end_time
union all
select
phone,
data_date - interval '5 min' start_time,
data_date + interval '5 min' end_time
from
only_data
;
总结
这个sql或许有很多问题,但是我拿着数据验证了几个目前没有发现问题,如果大神们有什么好的实现方法,欢迎指点。
最后
以上就是耍酷奇异果最近收集整理的关于GAUSSDB取连续的时间段GAUSSDB取连续的时间段总结的全部内容,更多相关GAUSSDB取连续内容请搜索靠谱客的其他文章。
发表评论 取消回复