gpt4 book ai didi

hadoop - 如何在不同列中使用 COLLECT_SET 和按条件分组

转载 作者:可可西里 更新时间:2023-11-01 16:24:39 24 4
gpt4 key购买 nike

我有这张表:

╔═════════╦═════════╦══════════════╗
║ user_id ║ item_id ║ date_visited ║
╠═════════╬═════════╬══════════════╣
║ 1 ║ 123 ║ 18/5/2017 ║
║ 1 ║ 234 ║ 11/3/2017 ║
║ 2 ║ 345 ║ 18/5/2017 ║
║ 2 ║ 456 ║ 11/3/2017 ║
╚═════════╩═════════╩══════════════╝

我试图(通过 Hive 查询)实现的是这个结果(假设今天是 18/5/2017):

╔═════════╦═══════════════════════════╦═════════════════════════════╗
║ user_id ║ items_visited_last_5_days ║ items_visited_last_100_days ║
╠═════════╬═══════════════════════════╬═════════════════════════════╣
║ 1 ║ 123 ║ 123, 234 ║
║ 2 ║ 345 ║ 345, 456 ║
╚═════════╩═══════════════════════════╩═════════════════════════════╝

基本上,我需要按 user_id 进行分组,并根据用户的访问(串联的 item_id)生成不同的列(基于时间间隔)。有可能实现吗?

提前谢谢你。

最佳答案

select      user_id
,collect_set (case when datediff(current_date,date_visited) <= 5 then item_id end) as items_visited_last_5_days
,collect_set (case when datediff(current_date,date_visited) <= 100 then item_id end) as items_visited_last_100_days

from mytable

group by user_id

+---------+---------------------------+-----------------------------+
| user_id | items_visited_last_5_days | items_visited_last_100_days |
+---------+---------------------------+-----------------------------+
| 1 | [123] | [123,234] |
| 2 | [345] | [345,456] |
+---------+---------------------------+-----------------------------+

select      user_id
,concat_ws (',',collect_set (case when datediff(current_date,date_visited) <= 5 then cast (item_id as string) end)) as items_visited_last_5_days
,concat_ws (',',collect_set (case when datediff(current_date,date_visited) <= 100 then cast (item_id as string) end)) as items_visited_last_100_days

from mytable

group by user_id

+---------+---------------------------+-----------------------------+
| user_id | items_visited_last_5_days | items_visited_last_100_days |
+---------+---------------------------+-----------------------------+
| 1 | 123 | 123,234 |
| 2 | 345 | 345,456 |
+---------+---------------------------+-----------------------------+

关于hadoop - 如何在不同列中使用 COLLECT_SET 和按条件分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44054288/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com