gpt4 book ai didi

sql - Hive collect_set() 但要删除连续的重复项

转载 作者:行者123 更新时间:2023-12-04 20:18:06 25 4
gpt4 key购买 nike

我想在使用 hive 时删除数组中的连续重复项。
collect_list()保留所有重复项,而 collect_set()只保留不同的条目。我有点需要一些中间立场。

例如,从下表:

id  |  number
==============
fk 4
fk 4
fk 2
4f 1
4f 8
4f 8
h9 7
h9 4
h9 7

我想得到这样的东西:
id | aggregate
===========================
fk Array<int>(4,2)
4f Array<int>(1,8)
h9 Array<int>(7,4,7)

最佳答案

使用 lag()解析函数获取前一个数字并与当前数字进行比较以检查连续数字。

演示:

with your_table as (--replace this subquery with your table
select stack(11, --the number of tuples
'fk',4,'2019-01-01 10:10:10.123',
'fk',4,'2019-01-01 10:10:10.124',
'fk',2,'2019-01-01 10:10:10.125',
'4f',1,'2019-01-01 10:10:10.126',
'4f',8,'2019-01-01 10:10:10.127',
'4f',8,'2019-01-01 10:10:10.128',
'h9',7,'2019-01-01 10:10:10.129',
'h9',4,'2019-01-01 10:10:10.130',
'h9',7,'2019-01-01 10:10:10.131',
'h9',7,'2019-01-01 10:10:10.132',
'h9',7,'2019-01-01 10:10:10.133'
) as (id, number, order_ts)
) --replace this subquery with your table

select id, collect_list(case when number = lag_number then null else number end) as aggregate
from
(select id, number, order_ts,
lag(number) over (partition by id order by order_ts) lag_number
from your_table
distribute by id sort by order_ts
)s
group by id;

结果:
id  aggregate   
4f [1,8]
fk [4,2]
h9 [7,4,7]

关于sql - Hive collect_set() 但要删除连续的重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55978504/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com