gpt4 book ai didi

sql - 从状态变化历史中获取每天的用户状态

转载 作者:行者123 更新时间:2023-11-29 12:57:27 25 4
gpt4 key购买 nike

我使用 postgres 并且有非常重要的查询。我有 2 个解决方案,问题是它们速度不快。

有一张表user_status_changes,是用户状态变化的历史

 user_id |         created_at  | from_status | to_status
---------+---------------------+-------------+-----------
3 | 2016-03-24 04:00:00 | active | pending
3 | 2016-03-27 19:59:21 | pending | banned
6 | 2016-03-16 10:00:00 | pending | active
6 | 2016-03-21 15:00:00 | active | banned
6 | 2016-03-25 19:52:46 | banned | pending
6 | 2016-03-25 20:53:22 | pending | canceled

用户

id |         created_at
----+----------------------------
3 | 2016-03-21 19:54:09.831252
6 | 2016-03-14 13:04:09.134358

我想要得到的是从 user.created_at 到今天的每一天的用户状态和前一天的用户状态的列表。

示例结果(假设今天是 2016-03-27):

 user_id   | date        | status_at | previous_status
-----------+-------------+-----------+-----------------
3 | 2016-03-21 | |
3 | 2016-03-22 | |
3 | 2016-03-23 | |
3 | 2016-03-24 | pending |
3 | 2016-03-25 | pending | pending
3 | 2016-03-26 | pending | pending
3 | 2016-03-27 | banned | pending
6 | 2016-03-14 | |
6 | 2016-03-15 | |
6 | 2016-03-16 | active |
6 | 2016-03-17 | active | active
6 | 2016-03-18 | active | active
6 | 2016-03-19 | active | active
6 | 2016-03-20 | active | active
6 | 2016-03-21 | banned | active
6 | 2016-03-22 | banned | banned
6 | 2016-03-23 | banned | banned
6 | 2016-03-24 | banned | banned
6 | 2016-03-25 | canceled | banned
6 | 2016-03-26 | canceled | canceled
6 | 2016-03-27 | canceled | canceled

我想到了 2 个解决方案。一个带有子查询(相当慢)

WITH possible_dates AS (
SELECT date(generate_series) AS "date"
FROM generate_series(
(SELECT min(created_at) FROM users)::date,
'2016-03-27'::date,
'1 day'
)
)
SELECT
user_id,
possible_dates.date,
(
SELECT to_status
FROM user_status_changes
WHERE user_status_changes.user_id = users.user_id
AND date(user_status_changes.created_at) <= possible_dates.date
ORDER BY user_status_changes.created_at DESC
LIMIT 1
) AS status_at,
LAG(
SELECT to_status
FROM user_status_changes
WHERE user_status_changes.user_id = users.user_id
AND date(user_status_changes.created_at) <= possible_dates.date
ORDER BY user_status_changes.created_at DESC
LIMIT 1
) OVER (PARTITION BY users.user_id ORDER BY possible_dates.date ASC) AS previous_status
FROM users
CROSS JOIN possible_dates
WHERE date(users.created_at) <= possible_dates.date

另一个via joins(似乎更快):

WITH status_changes AS (
SELECT
DISTINCT ON(user_id, date)
user_id,
created_at::date AS date,
to_status,
from_status
FROM user_status_changes
ORDER BY user_id, date, created_at DESC
),
possible_dates AS (
SELECT date(generate_series) AS "date"
FROM generate_series(
(SELECT min(created_at) FROM users)::date,
'2016-03-27'::date,
'1 day'
)
)
SELECT
DISTINCT ON (users.user_id, possible_dates.date)
users.user_id AS user_id,
possible_dates.date AS date,
s1.to_status AS status_at,
s2.to_status AS previous_status
FROM users
CROSS JOIN possible_dates
LEFT OUTER JOIN status_changes s1
ON s1.date <= possible_dates.date
AND s1.user_id = users.user_id
LEFT JOIN LATERAL (
SELECT
status_changes.to_status,
status_changes.date
FROM status_changes
WHERE
status_changes.date < possible_dates.date AND
status_changes.user_id = users.user_id
) s2 ON true
WHERE date(users.created_at) <= possible_dates.date
ORDER BY users.user_id, possible_dates.date DESC, s1.date DESC, s2.date DESC;

目前我们有大约 2 万名用户,每个用户每月约有 10 次付款和 2 次状态更改。第一个用户创建于 1 年前。

我认为连接方法的问题是我们连接所有以前的状态更改,然后仅通过 DISTINCT ON 删除冗余。

任何更好的解决方案将不胜感激,也欢迎索引建议。

最佳答案

我的查询不使用 LATERAL,它需要像您或 @Mike 那样计算每一行,所以这应该快得多。

解释

首先像您之前所做的那样生成数据集。 CTE: generate_dates

然后将输出限制为每个用户创建时的日期,并获取在这些日期设置的状态。 CTE: basic_status

在内部选择中,使用 LEFT JOINCOALESCE() 在每个状态之间填充空值,并限制输出抛出使用 DISTINCT ON 将日期之后的所有状态设置为最接近的状态。

外部选择仅用于使用 LAG() 窗口函数计算先前状态。

查询

WITH generate_dates AS (
SELECT date(generate_series) AS date
FROM generate_series(
(SELECT min(created_at) FROM users)::date,
'2016-03-27'::date,
'1 day'
)
)
, basic_status AS (
SELECT
u.id AS user_id,
g.date,
s.to_status AS status_at,
row_number() OVER (PARTITION BY u.id ORDER BY g.date) AS rownum
FROM users u
JOIN generate_dates g ON
g.date > u.created_at - interval '1 day'
LEFT JOIN user_status_changes s ON
u.id = s.user_id
AND s.created_at BETWEEN g.date AND g.date + interval '1 day'
)
SELECT
*,
LAG(status_at) OVER (PARTITION BY user_id ORDER BY date) AS previous_status
FROM (
SELECT
DISTINCT ON ( b1.user_id, b1.date )
b1.user_id,
b1.date,
COALESCE(b1.status_at, b2.status_at) AS status_at
FROM basic_status b1
LEFT JOIN basic_status b2 ON
b1.user_id = b2.user_id
AND b1.status_at IS NULL
AND b2.status_at IS NOT NULL
AND b1.rownum > b2.rownum
ORDER BY b1.user_id, b1.date DESC, b2.rownum DESC
) foo;

索引

您可以创建以下索引来加快速度:

  • 用户(id)
  • user_status_changes(user_id, created_at)
  • users(created_at) - 这个可能没那么重要

注意事项

请记住使用ANALYZE table 更新您的统计信息以更准确地估算成本。

关于sql - 从状态变化历史中获取每天的用户状态,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39512688/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com