gpt4 book ai didi

google-bigquery - Bigquery - 根据访问历史记录计算用户事件 session

转载 作者:行者123 更新时间:2023-12-01 19:39:54 27 4
gpt4 key购买 nike

我是 BQ 新手,不确定执行此查询会花费多少费用。

我有一个表记录所有用户的访问时间,如下所示:

user_id     access_time
-------------------------------------
user_a 2015-06-15 14:12:12
user_b 2015-06-15 14:12:12
user_a 2015-06-15 14:12:13
user_a 2015-06-15 14:12:19
user_a 2015-06-15 14:12:28
user_a 2015-06-15 19:32:15
user_a 2015-06-15 19:32:19

我想生成一个事件 session 表来表示用户的所有事件窗口。每个 session 包含持续时间和开始时间。

如果下次访问不在 10 秒内, session 将过期。

session 表的示例是:

session_id    user_id    session_start_time    duration
------------------------------------------------------------
1 user_a 2015-06-15 14:12:12 16
2 user_b 2015-06-15 14:12:12 0
3 user_a 2015-06-15 19:32:15 4

BQ好像不支持自定义功能,请问如何通过一次查询来实现?

提前致谢!

<小时/>

更新:

修复了示例。

最佳答案

为了使用示例中的数据来说明该方法,以下是显示带有开始时间的新 session 的查询:

select user, ts start_time from (
select user, ifnull(seconds - prev_seconds > 10, true) new_session from (
select user, ts, seconds, lag(seconds, 1) over(partition by user order by seconds) prev_seconds from
(select user, ts, integer(ts/1000000) seconds from
(select 'user_a' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_b' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:13') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:19') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:28') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:15') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:19') ts))))
where new_session

为了获取 session 的持续时间,我们可以运行另一个窗口函数,而不是进行自连接。基本上,我们首先找到 session 的开始和结束,然后计算它们之间的差异:

select user, ts, if(next_is_last, next_seconds - seconds, 0) duration
from (
select
user, new_session, last_session, ts, seconds,
lead(seconds, 1) over(partition by user order by seconds) next_seconds,
lead(last_session, 1) over(partition by user order by seconds) next_is_last
from (
select
user,
ts,
ifnull(seconds - prev_seconds > 10, true) new_session,
ifnull(next_seconds - seconds > 10, true) last_session
from (
select
user,
ts,
seconds,
lag(seconds, 1) over(partition by user order by seconds) prev_seconds,
lead(seconds, 1) over(partition by user order by seconds) next_seconds
from
(select user, ts, integer(ts/1000000) seconds from
(select 'user_a' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_b' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:13') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:19') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:28') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:15') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:19') ts))))
where new_session or last_session)
where new_session

这会导致:

Row user    ts                       duration    
1 user_a 2015-06-15 14:12:12 UTC 16
2 user_a 2015-06-15 19:32:15 UTC 4
3 user_b 2015-06-15 14:12:12 UTC 0

关于google-bigquery - Bigquery - 根据访问历史记录计算用户事件 session ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30858577/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com