gpt4 book ai didi

hive - 在 hive 中的 collect_list() 中排序

转载 作者:行者123 更新时间:2023-12-04 22:52:30 24 4
gpt4 key购买 nike

假设我有一个看起来像这样的 hive 表:

ID    event    order_num
------------------------
A red 2
A blue 1
A yellow 3
B yellow 2
B green 1
...

我正在尝试使用 collect_list 为每个 ID 生成事件列表。所以类似于以下内容:
SELECT ID, 
collect_list(event) as events_list,
FROM table
GROUP BY ID;

但是,在我分组所依据的每个 ID 中,我需要按 order_num 进行排序。这样我的结果表将如下所示:
ID    events_list
------------------------
A ["blue","red","yellow"]
B ["green","red"]

我无法在 collect_list() 查询之前按 ID 和 order_num 进行全局排序,因为该表很大。有没有办法在 collect_list 中按 order_num 排序?

谢谢!

最佳答案

所以,我找到了 answer here .诀窍是使用带有 DISTRIBUTE BY 和 SORT BY 语句的子查询。见下文:

WITH table1 AS (
SELECT 'A' AS ID, 'red' AS event, 2 AS order_num UNION ALL
SELECT 'A' AS ID, 'blue' AS event, 1 AS order_num UNION ALL
SELECT 'A' AS ID, 'yellow' AS event, 3 AS order_num UNION ALL
SELECT 'B' AS ID, 'yellow' AS event, 2 AS order_num UNION ALL
SELECT 'B' AS ID, 'green' AS event, 1 AS order_num
)

-- Collect it
SELECT subquery.ID,
collect_list(subquery.event) as events_list
FROM (
SELECT
table1.ID,
table1.event,
table1.order_num
FROM table1
DISTRIBUTE BY
table1.ID
SORT BY
table1.ID,
table1.order_num
) subquery
GROUP BY subquery.ID;

关于hive - 在 hive 中的 collect_list() 中排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50766764/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com