I'm working on an exercise from "Murach's MySQL 3rd Edition" which gives the following prompt:
我正在做“Murach的MySQL第三版”中的一个练习,它给出了以下提示:
"11. Write a SELECT statement that uses an aggregate window function to calculate a moving average of the sum of invoice totals. Return these columns:
“11.编写一条SELECT语句,该语句使用聚合窗口函数来计算发票总额的移动平均值。返回以下列:
- The month of the invoice date from the Invoices table
- The sum of the invoice totals from the Invoices table
- The moving average of the invoice totals sorted by invoice month
The result set should be grouped by invoice month and the frame for the
moving average should include the current row plus three rows before the
current row."
结果集应按发票月份分组,移动平均数的框架应包括当前行加上当前行之前的三行。
The invoices table being referenced looks like this (114 total rows):
被引用的INVOICES表如下所示(共114行):
invoice_id |
vendor_id |
invoice_number |
invoice_date |
invoice_total |
payment_total |
credit_total |
terms_id |
invoice_due_date |
payment_date |
1 |
122 |
989319-457 |
2018-04-08 |
3813.33 |
3813.33 |
0 |
3 |
2018-05-08 |
2018-05-07 |
The solution I came up with initially was this:
我最初提出的解决方案是这样的:
SELECT
EXTRACT(MONTH FROM invoice_date) AS months,
SUM(invoice_total) AS invoice_total_sum,
AVG(invoice_total) OVER(
ORDER BY EXTRACT(MONTH FROM invoice_date)
ROWS 3 PRECEDING
) AS rolling_avg
FROM
invoices
GROUP BY
months;
When I run this, I get the following error:
当我运行此命令时,我得到以下错误:
Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'ap.invoices.invoice_total' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
After looking at the Author's provided solution, and re-reading the question, I realized I was supposed to take the average of the monthly sums, not just the average of the invoices themselves. So the real solution is this:
在看了作者提供的解决方案并重新阅读问题后,我意识到我应该取每月金额的平均值,而不仅仅是发票本身的平均值。因此,真正的解决方案是:
SELECT
EXTRACT(MONTH FROM invoice_date) AS months,
SUM(invoice_total) AS invoice_total_sum,
AVG(SUM(invoice_total)) OVER( # why does this need the SUM() function?
ORDER BY EXTRACT(MONTH FROM invoice_date)
ROWS 3 PRECEDING
) AS rolling_avg
FROM
invoices
GROUP BY
months;
This produces an output table like so (with 5 rows):
这将产生如下所示的输出表(有5行):
months |
invoice_total_sum |
rolling_avg |
4 |
5828.18 |
5828.180000 |
Where I'm not understanding, however, is why the first solution doesn't run. I know why it isn't what the exercise is looking for, but I don't see what's causing it to error out.
然而,我不明白的是,为什么第一个解决方案不能运行。我知道为什么它不是练习要找的东西,但我不知道是什么原因导致它出错。
My understanding is that the reason aggregate functions like sum()
and avg()
give errors about 'functional dependence' is because otherwise you'd get an inconsistent number of output rows. If I was trying to use just sum(invoice_total)
and invoice_date
, it would produce one row for the sum and many for the dates, and wouldn't be able to resolve them into an output table (without changing the only_full_group_by mode). But the average function is also an aggregate function, so why doesn't it just average all the invoice totals for each month, as specified by the group by
clause?
我的理解是,像sum()和avg()这样的聚合函数之所以会给出‘函数依赖’方面的错误,是因为否则会得到不一致的输出行数。如果我只尝试使用SUM(INVOICE_TOTAL)和INVOICE_DATE,它将为SUM生成一行,为日期生成多行,并且无法将它们解析到输出表中(不会更改ONLY_FULL_GROUP_BY模式)。但是Average函数也是一个聚合函数,那么为什么不像GROUP BY子句所指定的那样,对每个月的所有发票总额进行平均呢?
更多回答
"why the first solution doesn't run" - the answer is in the description of the error. When you apply aggregation, every column you use should either be used to partition, or be aggregated. There are no other options. During the aggregation, which strictly happens before the window function is applied, the DBMS doesn't know how to fit n "invoice_total" values in a single row, and returns that error. On the other hand, in the second solution, you tell the DBMS that those values should be aggregated with the sum.
“为什么第一个解决方案不运行”--答案在错误的描述中。当您应用聚合时,您使用的每一列都应该用于分区,或者被聚合。没有其他选择。在聚合过程中,DBMS不知道如何在一行中容纳n个“INVOICE_TOTAL”值,并返回该错误。另一方面,在第二个解决方案中,您告诉DBMS这些值应该与总和聚合。
The error isn't about an "inconsistent number of output rows". it's because selecting columns that aren't in GROUP BY
will take those columns from an arbitrary row in the group. E.g. if you SELECT date
and then GROUP BY month
, the date will be some random day of the month.
该错误与“输出行数不一致”无关。这是因为选择不在GROUP BY中的列将从组中的任意行中获取这些列。例如,如果您选择日期,然后按月分组,则日期将是该月的某个随机日期。
Your title asks why AVG()
is needed. But you have AVG()
in both versions. Isn't your real quesiton why SUM()
is needed (as the comment in the code says)?
您的标题问为什么需要AVG()。但是您在两个版本中都有AVG()。您真正的问题不是为什么需要sum()(正如代码中的注释所说的那样)吗?
And the reason seems to be in the problem statement: "A moving average of the sum of invoice totals".
原因似乎在问题陈述中:“发票总额的移动平均数”。
Gotcha. I think that general idea of the over
clause being associated with the outer of the two functions is the step I was missing. IDK if there's a way to mark comments as the solution but that's where I was confused.
抓到你了。我认为Over子句与两个功能的外部关联的一般概念是我遗漏的步骤。我想知道是否有办法将评论标记为解决方案,但这就是我感到困惑的地方。
(1) The AVG
is an aggregate function, but here it is used with an OVER
clause which makes it a window function. With window function, the result is displayed on each row. Whereas with a regular aggregate function it combines multiple rows and displays result on one row.
(1)AVG是一个聚合函数,但这里将其与OVER子句一起使用,从而使其成为窗口函数。使用窗口函数,结果显示在每一行上。而使用常规聚合函数时,它组合多行并在一行上显示结果。
In your query you could remove the SUM
like below and can see that the grouping of AVG
is there only for the current row and the preceding 3 rows; the results are displayed on each row
在您的查询中,您可以删除如下所示的总和,并且可以看到,AVG的分组只存在于当前行和前3行;结果显示在每行上
SELECT
EXTRACT(MONTH FROM invoice_date) AS months,
invoice_date,
AVG(invoice_total) OVER(
ORDER BY EXTRACT(MONTH FROM invoice_date)
ROWS 3 PRECEDING
) AS rolling_avg
FROM
invoices
ORDER BY EXTRACT(MONTH FROM invoice_date)
(2) Your next question was why the SUM(invoice_total)
is inside of AVG
.
Here the AVG function is working over the SUM of rows which are already GROUPed BY EXTRACT(MONTH FROM invoice_date). Inside the window function the grouped rows are ORDERed BY so that the current row looks back over past 3 SUM'ed rows (which is past 3 months) to get the AVG
(2)您的下一个问题是为什么金额(INVOICE_TOTAL)在AVG内部。这里,AVG函数处理已经按提取(MONTH FOR INVOICE_DATE)分组的行的总和。在窗口函数中,对分组的行进行排序,以便当前行回顾过去的3个求和行(过去3个月)以获得AVG
Hope this helps
希望这能有所帮助
The comment from lemon was what helped me figure it out. The avg()
function is associated with the open()
function, so it isn't available when the group by
clause takes effect.
柠檬的评论帮助我弄明白了这一点。avg()函数与open()函数相关联,因此当group by子句生效时,它不可用。
更多回答
我是一名优秀的程序员,十分优秀!