概述
hive中有row_number() over (partition by)函数,可以一句SQL实现想要的排序,在ClickHouse中有很多种实现方式,本篇就介绍一下几种方法。
目录
1.row_number排序
2.row_number排序后取出rank=1的结果
3.特殊场景
1.row_number排序
HIVE中写法:
select number,
row_number() over (partition by number order by time desc) as rank
from table a
GROUP BY number
ClickHouse写法:
select number,
groupArray(time) AS arr_val,
arrayEnumerate(arr_val) as row_number
from (select distinct orderid as number,
toDate(operatetime) as time
from table
order by time desc
) a
GROUP BY number
2.row_number排序后取出rank=1的结果
hive写法:
select orderid
from (select orderid,
row_number() over(partition by orderid order by datachange_lasttime desc) as row_num
from table
where d = '${CurrentDate}'
) a
where row_num = 1;
ClickHouse写法:
方法1:利用groupArray
select orderid,
groupArray(1)(datachange_lasttime) as dates
from (select orderid,
datachange_lasttime
from table
ORDER BY orderid, datachange_lasttime desc
) a
group by orderid
方法2:利用max函数实现倒序,如果正序使用min函数即可
select orderid,
max(datachange_lasttime) as datachange_lasttime
from table
group by orderid
方法3:利用rowNumberInAllBlocks函数
select orderid, status
from (select orderid, status, rowNumberInAllBlocks() as rank
from (select orderid, status, datachange_lasttime
from table
order by orderid, datachange_lasttime desc
) a
) b LIMIT 1 BY orderid
方法4:利用arrayEnumerate函数
select orderid
from (select orderid,
groupArray(datachange_lasttime) AS arr_val,
arrayEnumerate(arr_val) as row_number
from (select orderid, datachange_lasttime
from table
order by datachange_lasttime desc
) a
GROUP BY number
) b
where row_number = 1
3.特殊场景
要求:
对于以下场景,需要按照orderid分组,按照日期倒序,取最新一条,若日期一致,则随机取一条作为结果即可
hive写法:
select orderid
from (select orderid,
status,
row_number() over(partition by orderid order by datachange_lasttime desc) as row_num
from table
where d = '${CurrentDate}'
) as b
where row_num = 1
ClickHouse写法:
通过上面的案例,我们很容易想到,把上面的结果作为一个子表,与原表进行关联,只是这样关联,随便举一个关联的写法:
select a.orderid as orderid_a, a.status as status
from olap_htlmaindb.tmp_ord_orders_status_s_pre a
inner join (select orderid, groupArray(1)(datachange_lasttime) as dates
from (select orderid, datachange_lasttime
from table
ORDER BY orderid, datachange_lasttime desc
) a
group by orderid) b
on a.orderid = b.orderid
and cast(a.datachange_lasttime as String) = cast(b.dates [ 1 ] as String)
这里我们是先把符合要求的orderid和时间取出来,再回去关联,取出需要的列,因为这些函数都有一个缺点是只能有partition by的字段和排序字段,不能有其他字段,所以要返回关联,所以上面四种方法,ininer join原表,都不能解决上面案例的问题。
这里就想到了LIMIT 1 BY这个方法,这个方法其实是最有效的,如下:
select orderid,
status,
datachange_lasttime
from table
order by orderid, datachange_lasttime desc
LIMIT 1 BY orderid
最后
以上就是要减肥康乃馨为你收集整理的【ClickHouse】row_number() over (partition by)的几种实现方法1.row_number排序2.row_number排序后取出rank=1的结果3.特殊场景的全部内容,希望文章能够帮你解决【ClickHouse】row_number() over (partition by)的几种实现方法1.row_number排序2.row_number排序后取出rank=1的结果3.特殊场景所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复