gpt4 book ai didi

sql - 每天查询数周内有日期限制的计数

转载 作者:行者123 更新时间:2023-11-29 11:40:32 25 4
gpt4 key购买 nike

我每天都在寻找活跃用户。
当用户连续4周每周发出超过10个请求时,该用户处于活动状态。
即,2014年10月31日,如果用户每周总共发出10个以上的请求,则该用户处于活动状态:
2014年10月24日-10月30日
2014年10月17日-10月23日
2014年10月10日至10月16日
2014年10月3日-10月9日
我有一张requests的桌子:

CREATE TABLE requests (
id text PRIMARY KEY, -- id of the request
amount bigint, -- sum of requests made by accounts_id to recipient_id,
-- aggregated on a daily basis based on "date"
accounts_id text, -- id of the user
recipient_id text, -- id of the recipient
date timestamp -- date that the request was made in YYYY-MM-DD
);

样本值:
INSERT INTO requests2
VALUES
('1', 19, 'a1', 'b1', '2014-10-05 00:00:00'),
('2', 19, 'a2', 'b2', '2014-10-06 00:00:00'),
('3', 85, 'a3', 'b3', '2014-10-07 00:00:00'),
('4', 11, 'a1', 'b4', '2014-10-13 00:00:00'),
('5', 2, 'a2', 'b5', '2014-10-14 00:00:00'),
('6', 50, 'a3', 'b5', '2014-10-15 00:00:00'),
('7', 787323, 'a1', 'b6', '2014-10-17 00:00:00'),
('8', 33, 'a2', 'b8', '2014-10-18 00:00:00'),
('9', 14, 'a3', 'b9', '2014-10-19 00:00:00'),
('10', 11, 'a4', 'b10', '2014-10-19 00:00:00'),
('11', 1628, 'a1', 'b11', '2014-10-25 00:00:00'),
('13', 101, 'a2', 'b11', '2014-10-25 00:00:00');

示例输出:
Date       | # Active users
-----------+---------------
10-01-2014 | 600
10-02-2014 | 703
10-03-2014 | 891

以下是我试图查找某个日期(例如2014年1月10日)的活动用户数:
SELECT count(*)
FROM
(SELECT accounts_id
FROM requests
WHERE "date" BETWEEN '2014-10-01'::date - interval '2 weeks' AND '2014-10-01'::date - interval '1 week'
GROUP BY accounts_id HAVING sum(amount) > 10) week_1
JOIN
(SELECT accounts_id
FROM requests
WHERE "date" BETWEEN '2014-10-01'::date - interval '3 weeks' AND '2014-10-01'::date - interval '2 week'
GROUP BY accounts_id HAVING sum(amount) > 10) week_2 ON week_1.accounts_id = week_2.accounts_id
JOIN
(SELECT accounts_id
FROM requests
WHERE "date" BETWEEN '2014-10-01'::date - interval '4 weeks' AND '2014-10-01'::date - interval '3 week'
GROUP BY accounts_id HAVING sum(amount) > 10) week_3 ON week_2.accounts_id = week_3.accounts_id
JOIN
(SELECT accounts_id
FROM requests
WHERE "date" BETWEEN '2014-10-01'::date - interval '5 weeks' AND '2014-10-01'::date - interval '4 week'
GROUP BY accounts_id HAVING sum(amount) > 10) week_4 ON week_3.accounts_id = week_4.accounts_id

因为这只是一个查询来获取1天的号码,所以我需要随着时间的推移每天获取这个号码。我想我的想法是加入约会,所以我试着这样做:
SELECT week_1."Date_series",
count(*)
FROM
(SELECT to_char(DAY::date, 'YYYY-MM-DD') AS "Date_series",
accounts_id
FROM generate_series('2014-10-01'::date, CURRENT_DATE, '1 day') DAY, requests
WHERE to_char(DAY::date, 'YYYY-MM-DD')::date BETWEEN requests.date::date - interval '2 weeks' AND requests.date::date - interval '1 week'
GROUP BY "Date_series",
accounts_id HAVING sum(amount) > 10) week_1
JOIN
(SELECT to_char(DAY::date, 'YYYY-MM-DD') AS "Date_series",
accounts_id
FROM generate_series('2014-10-01'::date, CURRENT_DATE, '1 day') DAY, requests
WHERE to_char(DAY::date, 'YYYY-MM-DD')::date BETWEEN requests.date::date - interval '3 weeks' AND requests.date::date - interval '2 week'
GROUP BY "Date_series",
accounts_id HAVING sum(amount) > 10) week_2 ON week_1.accounts_id = week_2.accounts_id
AND week_1."Date_series" = week_2."Date_series"
JOIN
(SELECT to_char(DAY::date, 'YYYY-MM-DD') AS "Date_series",
accounts_id
FROM generate_series('2014-10-01'::date, CURRENT_DATE, '1 day') DAY, requests
WHERE to_char(DAY::date, 'YYYY-MM-DD')::date BETWEEN requests.date::date - interval '4 weeks' AND requests.date::date - interval '3 week'
GROUP BY "Date_series",
accounts_id HAVING sum(amount) > 10) week_3 ON week_2.accounts_id = week_3.accounts_id
AND week_2."Date_series" = week_3."Date_series"
JOIN
(SELECT to_char(DAY::date, 'YYYY-MM-DD') AS "Date_series",
accounts_id
FROM generate_series('2014-10-01'::date, CURRENT_DATE, '1 day') DAY, requests
WHERE to_char(DAY::date, 'YYYY-MM-DD')::date BETWEEN requests.date::date - interval '5 weeks' AND requests.date::date - interval '4 week'
GROUP BY "Date_series",
accounts_id HAVING sum(amount) > 10) week_4 ON week_3.accounts_id = week_4.accounts_id
AND week_3."Date_series" = week_4."Date_series"
GROUP BY week_1."Date_series"

然而,我想我没有得到正确的答案,我不知道为什么。任何提示/指导/指示都非常感谢!:) :)
另外,我用的是Postgres9.3

最佳答案

这里有一个很长的答案如何使你的问题简短。:)
桌子
在我的表上构建(在您提供带有different(odd!)的表定义之前)数据类型:

CREATE TABLE requests (
id int
, accounts_id int -- (id of the user)
, recipient_id int -- (id of the recipient)
, date date -- (date that the request was made in YYYY-MM-DD)
, amount int -- (# of requests by accounts_id for the day)
);

当天的活动用户
一天内的“活动用户”列表:
SELECT accounts_id
FROM (
SELECT w.w, r.accounts_id
FROM (
SELECT w
, day - 6 - 7 * w AS w_start
, day - 7 * w AS w_end
FROM (SELECT '2014-10-31'::date - 1 AS day) d -- effective date here
, generate_series(0,3) w
) w
JOIN requests r ON r."date" BETWEEN w_start AND w_end
GROUP BY w.w, r.accounts_id
HAVING sum(r.amount) > 10
) sub
GROUP BY 1
HAVING count(*) = 4;

步骤1
在最里面的子查询 w(对于“week”)中,从给定的第1天的 CROSS JOIN开始构建感兴趣的4周的边界,输出为 generate_series(0-3)
date之间加/减天数(不是从时间戳!)只需加/减 integer个数。表达式从给定的日期减去0-3乘以7天,得出每周的结束日期( day - 7 * w)。
再签6天的合同(不是7天!)从每个开始计算各自的开始( w_end)。
此外,保留周数 w_start(0-3)以供以后的聚合。
步骤2
在子查询 w中,将 sub中的行连接到4周的集合,其中日期介于开始日期和结束日期之间。 requests周数 GROUP BYw
只有超过10个请求的周才有资格。
步骤3
在外部 accounts_id中,计算每个用户( SELECT)符合条件的周数。必须是4才能成为“活动用户”
每天活动用户数
这是炸药。
包装在一个简单的SQL函数中以简化一般使用,但是查询也可以单独使用:
CREATE FUNCTION f_active_users (_now date = now()::date, _days int = 3)
RETURNS TABLE (day date, users int) AS
$func$
WITH r AS (
SELECT accounts_id, date, sum(amount)::int AS amount
FROM requests
WHERE date BETWEEN _now - (27 + _days) AND _now - 1
GROUP BY accounts_id, date
)
SELECT date + 1, count(w_ct = 4 OR NULL)::int
FROM (
SELECT accounts_id, date
, count(w_amount > 10 OR NULL)
OVER (PARTITION BY accounts_id, dow ORDER BY date DESC
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) AS w_ct
FROM (
SELECT accounts_id, date, dow
, sum(amount) OVER (PARTITION BY accounts_id ORDER BY date DESC
ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING) AS w_amount
FROM (SELECT _now - i AS date, i%7 AS dow
FROM generate_series(1, 27 + _days) i) d -- period of interest
CROSS JOIN (
SELECT accounts_id FROM r
GROUP BY 1
HAVING count(*) > 3 AND sum(amount) > 39 -- enough rows & requests
AND max(date) > min(date) + 15) a -- can cover 4 weeks
LEFT JOIN r USING (accounts_id, date)
) sub1
WHERE date > _now - (22 + _days) -- cut off 6 trailing days now - useful?
) sub2
GROUP BY date
ORDER BY date DESC
LIMIT _days
$func$ LANGUAGE sql STABLE;

该函数默认为任意日期( accounts_id)、“今天”和结果中的天数( _now),默认为3。呼叫:
SELECT * FROM f_active_users('2014-10-31', 5);

或者没有参数使用默认值:
SELECT * FROM f_active_users();

这种方法与第一个查询不同。
SQL Fiddle同时包含查询和用于表定义的变量。
步骤0
在CTE _days中,每个 r的预聚合量仅用于利息期,以获得更好的绩效。该表只扫描一次,建议的索引(见blow)将在此处启动。
步骤1
在内部子查询 (accounts_id, date)中生成所需的天数列表: d行,其中 27 + _days是输出中所需的行数,有效期为28天或更长。
在执行时,计算在步骤3中用于聚合的星期几( _days)。 dow与每周间隔一致,但是查询对任何间隔都有效。
在内部子查询 i%7中生成一个唯一的用户列表( a),该用户列表存在于CTE accounts_id中,并通过一些最初的浅层测试(足够长的行,具有足够的总请求量)。
步骤2
rd生成带有 a的笛卡尔积,以便在每个相关用户的每个相关日具有一行。 CROSS JOINLEFT JOIN以追加请求量(如果有)。不 r条件,我们希望每天都有结果,即使根本没有活动用户。
使用 Window functions with a custom frame.示例在同一步骤中计算过去一周( WHERE)的总量:
How to use a ring data structure in window functions
步骤3
现在就切断最后6天;这是可选的,可能有助于性能,也可能没有。测试: w_amount
在类似的窗口函数中计算满足最低金额( WHERE date >= _now - (21 + _days))的周数,此时间被 w_ct除此之外,在框架中只有过去4周的相同工作日(包含各自过去一周的总和)。
表达式 dow只计算具有10个以上请求的行。详细说明:
Compute percents from SUM() in the same SELECT sql query
步骤4
在外部 count(w_amount > 10 OR NULL)组中,按 SELECT分组并计算通过所有4周( date)的用户数。将1添加到日期,以补偿非连续1天, count(w_ct = 4 OR NULL)ORDER到请求的天数。
业绩与展望
这两个查询的最佳索引是:
CREATE INDEX foo ON requests (date, accounts_id, amount);

性能应该不错,但是由于新的移动聚合支持,在即将发布的Postgres9.4中会变得更好:
Moving-aggregate support in the Postgres Wiki.
Moving aggregates in the 9.4 manual
旁白:不要把“日期”列称为“日期”,它是一个 LIMIT,而不是一个 timestamp。更好的是,不要使用基本类型名,如 timestampdate作为标识符。永远。

关于sql - 每天查询数周内有日期限制的计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26899491/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com