gpt4 book ai didi

sql - 在 PostgreSQL 中按用户展平相交时间跨度

转载 作者:行者123 更新时间:2023-11-29 11:48:05 24 4
gpt4 key购买 nike

我正在尝试将重叠的开始结束时间戳合并为单个时间跨度。类似问题可用here所以。我想为数据中的每个用户分别合并时间戳。

SQLFiddle

示例数据:

-- drop table if exists app_log;

create table app_log (
user_id int,
login_time timestamp,
logout_time timestamp
);

insert into app_log values
(1, '2014-01-01 08:00', '2014-01-01 10:00'), /* here we start */
(1, '2014-01-01 09:10', '2014-01-01 09:59'), /* fully included in previous interval */
(1, '2014-01-01 10:00', '2014-01-01 10:48'), /* continuing first interval */
(1, '2014-01-01 10:40', '2014-01-01 10:49'), /* continuing previous interval */
(1, '2014-01-01 10:55', '2014-01-01 11:00'), /* isolated interval */
(2, '2014-01-01 09:00', '2014-01-01 11:00'), /* 2nd user is shifted by one hour */
(2, '2014-01-01 10:10', '2014-01-01 10:59'), /* to simulate overlaps with 1st user */
(2, '2014-01-01 11:00', '2014-01-01 11:48'),
(2, '2014-01-01 11:40', '2014-01-01 11:49'),
(2, '2014-01-01 11:55', '2014-01-01 12:00')
;

要求的结果:

  used_id  login_time       logout_time
1 2014-01-01 08:00 2014-01-01 10:49 /* Merging first 4 lines */
1 2014-01-01 10:55 2014-01-01 11:00 /* 5 th line is isolated */
2 2014-01-01 09:00 2014-01-01 11:49 /* Merging lines 6-9 */
2 2014-01-01 11:55 2014-01-01 12:00 /* last line is isolated */

我尝试使用 mentioned question 中提供的解决方案之一,但即使是单个用户也不会返回正确答案:

with recursive

in_data as (select login_time as d1, logout_time as d2 from app_log where user_id = 1)

, dateRanges (ancestorD1, parentD1, d2, iter) as
(
--anchor is first level of collapse
select
d1 as ancestorD1,
d1 as parentD1,
d2,
cast(0 as int) as iter
from in_data

--recurse as long as there is another range to fold in
union all

select
tLeft.ancestorD1,
tRight.d1 as parentD1,
tRight.d2,
iter + 1 as iter
from dateRanges as tLeft join in_data as tRight
--join condition is that the t1 row can be consumed by the recursive row
on tLeft.d2 between tRight.d1 and tRight.d2
--exclude identical rows
and not (tLeft.parentD1 = tRight.d1 and tLeft.d2 = tRight.d2)
)
select
ranges1.*
from dateRanges as ranges1
where not exists (
select 1
from dateRanges as ranges2
where ranges1.ancestorD1 between ranges2.ancestorD1 and ranges2.d2
and ranges1.d2 between ranges2.ancestorD1 and ranges2.d2
and ranges2.iter > ranges1.iter
);

结果:

ancestord1 parentd1 d2 iter
2014-01-01 10:55:00;2014-01-01 10:55:00;2014-01-01 11:00:00;0
2014-01-01 08:00:00;2014-01-01 10:40:00;2014-01-01 10:49:00;2
2014-01-01 09:10:00;2014-01-01 10:40:00;2014-01-01 10:49:00;3

上面的查询有什么问题,我如何扩展它以获取用户的结果? PostgreSQL 中是否有更好的解决方案?

最佳答案

我找到了这个 example of how to make a 'range aggregate'使用窗口函数和大量嵌套子查询。我只是将其调整为按 user_id 进行分区和分组,它似乎可以满足您的要求:

SELECT user_id, min(login_time) as login_time, max(logout_time) as logout_time
FROM (
SELECT user_id, login_time, logout_time,
max(new_start) OVER (PARTITION BY user_id ORDER BY login_time, logout_time) AS left_edge
FROM (
SELECT user_id, login_time, logout_time,
CASE
WHEN login_time <= max(lag_logout_time) OVER (
PARTITION BY user_id ORDER BY login_time, logout_time
) THEN NULL
ELSE login_time
END AS new_start
FROM (
SELECT
user_id,
login_time,
logout_time,
lag(logout_time) OVER (PARTITION BY user_id ORDER BY login_time, logout_time) AS lag_logout_time
FROM app_log
) AS s1
) AS s2
) AS s3
GROUP BY user_id, left_edge
ORDER BY user_id, min(login_time)

结果:

 user_id |     login_time      |     logout_time
---------+---------------------+---------------------
1 | 2014-01-01 08:00:00 | 2014-01-01 10:49:00
1 | 2014-01-01 10:55:00 | 2014-01-01 11:00:00
2 | 2014-01-01 09:00:00 | 2014-01-01 11:49:00
2 | 2014-01-01 11:55:00 | 2014-01-01 12:00:00
(4 rows)

它的工作原理是首先检测每个新范围的开始(按 user_id 划分),然后按检测到的范围进行扩展和分组。我发现我必须非常仔细地阅读那篇文章才能理解它!

文章建议可以使用 Postgresql>=9.0 通过删除最里面的子查询和更改窗口范围来简化它,但我无法让它工作。

关于sql - 在 PostgreSQL 中按用户展平相交时间跨度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21928848/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com