gpt4 book ai didi

hadoop - Hive 查询为匹配条件的一系列行生成标识符

转载 作者:可可西里 更新时间:2023-11-01 15:13:24 24 4
gpt4 key购买 nike

假设我有以下配置单元表作为输入,我们称它为connections:

userid  | timestamp   
--------|-------------
1 | 1433258019
1 | 1433258020
2 | 1433258080
2 | 1433258083
2 | 1433258088
2 | 1433258170
[...] | [...]

使用以下查询:

SELECT
userid,
timestamp,
timestamp - LAG(timestamp, 1, 0) OVER w AS timediff
CASE
WHEN timediff > 60
THEN 'new_session'
ELSE 'same_session'
END AS session_state
FROM connections
WINDOW w PARTITION BY userid ORDER BY timestamp ASC;

我正在生成以下输出:

userid  | timestamp   | timediff   | session_state
--------|-------------|------------|---------------
1 | 1433258019 | 1433258019 | new_session
1 | 1433258020 | 1 | same_session
2 | 1433258080 | 1433258080 | new_session
2 | 1433258083 | 3 | same_session
2 | 1433258088 | 5 | same_session
2 | 1433258170 | 82 | new_session
[...] | [...] | [...] | [...]

我将如何生成:

userid  | timestamp   | timediff   | sessionid
--------|-------------|------------------------------
1 | 1433258019 | 1433258019 | user1-session-1
1 | 1433258020 | 1 | user1-session-1
2 | 1433258080 | 1433258080 | user2-session-1
2 | 1433258083 | 3 | user2-session-1
2 | 1433258088 | 5 | user2-session-1
2 | 1433258170 | 82 | user2-session-2
[...] | [...] | [...] | [...]

仅使用 HQL 和“著名的”UDF 是否可能(我宁愿不使用自定义 UDF 或缩减器脚本)?

最佳答案

有趣的问题。根据您对@Madhu 的评论,我在您的示例中添加了 2 1433258172 行。您需要的是每次满足 timediff > 60 时递增。执行此操作的最简单方法是对其进行标记,然后在窗口中累积求和。

查询:

select userid
, timestamp
, concat('user', userid, '-session-', s_sum) sessionid
from (
select *
, sum( counter ) over (partition by userid
order by timestamp asc
rows between unbounded preceding and current row) s_sum
from (
select *
, case when timediff > 60 then 1 else 0 end as counter
from (
select userid
, timestamp
, timestamp - lag(timestamp, 1, 0) over (partition by userid
order by timestamp asc) timediff
from connections ) x ) y ) z

输出:

1   1433258019  user1-session-1
1 1433258020 user1-session-1
2 1433258080 user2-session-1
2 1433258083 user2-session-1
2 1433258088 user2-session-1
2 1433258170 user2-session-2
2 1433258172 user2-session-2

关于hadoop - Hive 查询为匹配条件的一系列行生成标识符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30600543/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com