gpt4 book ai didi

SQL:当有些月份没有记录时,如何查询每月总和的平均值?

转载 作者:行者123 更新时间:2023-12-04 02:30:01 24 4
gpt4 key购买 nike

TL;WR:当有些月份没有记录(所以应该为0)时,如何查询月总和的平均值?


背景

我的 children 每天都在报告他们做家务的时间(在 PostgreSQL 数据库中)。我的数据集看起来像这样:

date,user,duration

2020-01-01,Alice,120
2020-01-02,Bob,30
2020-01-03,Charlie,10
2020-01-23,Charlie,10

2020-02-03,Charlie,10
2020-02-23,Charlie,10

2020-03-02,Bob,30
2020-03-03,Charlie,10
2020-03-23,Charlie,10

我想知道他们平均每个月做多少。具体来说,我想要的结果是:

  • 爱丽丝:40 =(120+0+0)÷3
  • 鲍勃:20 =(30+0+30)÷3
  • 查理:20 =([10+10]+[10+10]+[10+10])÷3

问题

在某些月份,我没有某些用户的记录(例如,2 月和 3 月的 Alice)。因此,运行以下嵌套查询不会返回我想要的结果;事实上,这并没有考虑到因为没有这几个月的记录,所以 Alice 在 2 月和 3 月的贡献应该为 0(这里的平均值被错误地计算为 120)。

-- this does not work
SELECT
"user",
round(avg(monthly_duration)) as avg_monthly_sum
FROM (
SELECT
date_trunc('month', date),
"user",
sum(duration) as monthly_duration
FROM
public.chores_record
GROUP BY
date_trunc('month', date),
"user"
) AS monthly_sum
GROUP BY
"user"
;
-- Doesn't return what I want:
--
-- "unique_user","avg_monthly_sum"
-- "Alice",120
-- "Bob",30
-- "Charlie",20

因此,我构建了一个相当繁琐的查询如下:

  1. 列出独特的月份,
  2. 列出唯一用户,
  3. 生成月份×用户组合,
  4. 从原始数据中添加每月总和,
  5. 获取月总和的平均值(假设 'null' = 0)。
SELECT
unique_user,
round(avg(COALESCE(monthly_duration, 0))) -- COALESCE transforms 'null' into 0
FROM (
-- monthly duration with 'null' if no record for that user×month
SELECT
month_user_combinations.month,
month_user_combinations.unique_user,
monthly_duration.monthly_duration
FROM
(
(
-- all months×users combinations
SELECT
month,
unique_user
FROM (
(
-- list of unique months
SELECT DISTINCT
date_trunc('month', date) as month
FROM
public.chores_record
) AS unique_months
CROSS JOIN
(
-- list of unique users
SELECT DISTINCT
"user" as "unique_user"
FROM
public.chores_record
) AS unique_users
)
) AS month_user_combinations
LEFT OUTER JOIN
(
-- monthly duration for existing month×user combination only
SELECT
date_trunc('month', date) as month,
"user",
sum(duration) as monthly_duration
FROM
public.chores_record
GROUP BY
date_trunc('month', date),
"user"
) AS monthly_duration
ON (
month_user_combinations.month = monthly_duration.month
AND
month_user_combinations.unique_user = monthly_duration.user
)
)
) AS monthly_duration_for_all_combinations
GROUP BY
unique_user
;

这个查询有效,但是非常庞大。

问题

如何比上面更优雅的查询月总和的平均值,同时考虑“无记录⇒月总和=0”?

注意:可以安全地假设我想计算只有至少一个记录的月份的平均值(即这里不考虑 12 月或 4 月是正常的。)


MWE

CREATE TABLE public.chores_record
(
date date NOT NULL,
"user" text NOT NULL,
duration integer NOT NULL,
PRIMARY KEY (date, "user")
);

INSERT INTO
public.chores_record(date, "user", duration)
VALUES
('2020-01-01','Alice',120),
('2020-01-02','Bob',30),
('2020-01-03','Charlie',10),
('2020-01-23','Charlie',10),
('2020-02-03','Charlie',10),
('2020-02-23','Charlie',10),
('2020-03-02','Bob',30),
('2020-03-03','Charlie',10),
('2020-03-23','Charlie',10)
;

最佳答案

您可以使用 CTE 构建日历表:


-- EXPLAIN
WITH cal AS ( -- The unique months
SELECT DISTINCT date_trunc('mon', zdate) AS tick
FROM chores_record
)
, cnt AS ( -- the number of months (a scalar)
SELECT COUNT(*) AS nmonth
FROM cal
)
SELECT
x.zuser
, SUM(x.duration) AS tot_duration
, SUM(x.duration) / SUM(c.nmonth) AS Averarage_month -- this is ugly ...
FROM cal t
JOIN cnt c ON true -- This is ugly
LEFT JOIN chores_record x ON date_trunc('mon', x.zdate) = t.tick
GROUP BY x.zuser
;

关于SQL:当有些月份没有记录时,如何查询每月总和的平均值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64843225/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com