gpt4 book ai didi

SQL Server - 具有相关性的条件聚合

转载 作者:行者123 更新时间:2023-11-29 11:10:51 25 4
gpt4 key购买 nike

背景:

original case很简单。从最高收入到最低收入计算每个用户的运行总计:

CREATE TABLE t(Customer INTEGER  NOT NULL PRIMARY KEY 
,"User" VARCHAR(5) NOT NULL
,Revenue INTEGER NOT NULL);

INSERT INTO t(Customer,"User",Revenue) VALUES
(001,'James',500),(002,'James',750),(003,'James',450),
(004,'Sarah',100),(005,'Sarah',500),(006,'Sarah',150),
(007,'Sarah',600),(008,'James',150),(009,'James',100);

查询:

SELECT *,
1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY "User") AS percentage,
1.0 * SUM(Revenue) OVER(PARTITION BY "User" ORDER BY Revenue DESC)
/SUM(Revenue) OVER(PARTITION BY "User") AS running_percentage
FROM t;

LiveDemo

输出:

╔════╦═══════╦═════════╦════════════╦════════════════════╗
║ ID ║ User ║ Revenue ║ percentage ║ running_percentage ║
╠════╬═══════╬═════════╬════════════╬════════════════════╣
║ 2 ║ James ║ 750 ║ 0.38 ║ 0.38 ║
║ 1 ║ James ║ 500 ║ 0.26 ║ 0.64 ║
║ 3 ║ James ║ 450 ║ 0.23 ║ 0.87 ║
║ 8 ║ James ║ 150 ║ 0.08 ║ 0.95 ║
║ 9 ║ James ║ 100 ║ 0.05 ║ 1 ║
║ 7 ║ Sarah ║ 600 ║ 0.44 ║ 0.44 ║
║ 5 ║ Sarah ║ 500 ║ 0.37 ║ 0.81 ║
║ 6 ║ Sarah ║ 150 ║ 0.11 ║ 0.93 ║
║ 4 ║ Sarah ║ 100 ║ 0.07 ║ 1 ║
╚════╩═══════╩═════════╩════════════╩════════════════════╝

它可以使用特定的窗口函数进行不同的计算。


现在假设我们不能使用窗口化的 SUM 并重写它:

SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c3.s,0) AS percentage
,1.0 * c2.s / NULLIF(c3.s,0) AS running_percentage
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c."User" = c2."User"
AND c2.Revenue >= c.Revenue) AS c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c."User" = c2."User") AS c3
ORDER BY "User", Revenue DESC;

LiveDemo

我使用了 CROSS APPLY,因为我不喜欢 SELECT 列列表中的相关子查询,并且 c3 被使用了两次。

一切正常。但是当我们仔细观察时,c2c3 非常相似。那么为什么不将它们结合起来并使用简单的条件聚合:

SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS sum_total,
SUM(CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END)
AS sum_running
FROM t c2
WHERE c."User" = c2."User") AS c2
ORDER BY "User", Revenue DESC;

很遗憾,这是不可能的。

Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.

当然我可以用另一个子查询绕过它,但它变得有点“丑陋”:

SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
( SELECT SUM(Revenue) AS sum_total,
SUM(running_revenue) AS sum_running
FROM (SELECT Revenue,
CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END
AS running_revenue
FROM t c2
WHERE c."User" = c2."User") AS sub
) AS c2
ORDER BY "User", Revenue DESC

LiveDemo


Postgresql 版本。唯一的区别是 LATERAL 而不是 CROSS APPLY

SELECT c.Customer, c."User", c.Revenue
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.running_sum / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
,LATERAL (SELECT SUM(Revenue) AS sum_total,
SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
AS running_sum
FROM t c2
WHERE c."User" = c2."User") c2
ORDER BY "User", Revenue DESC;

SqlFiddleDemo

效果很好。


SQLite/MySQL 版本(这就是为什么我更喜欢LATERAL/CROSS APPLY):

SELECT c.Customer, c."User", c.Revenue,
1.0 * Revenue / (SELECT SUM(Revenue)
FROM t c2
WHERE c."User" = c2."User") AS percentage,
1.0 * (SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
FROM t c2
WHERE c."User" = c2."User") /
(SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

SQLFiddleDemo-SQLite SQLFiddleDemo-MySQL


我读过 Aggregates with an Outer Reference :

The source for the restriction is in the SQL-92 standard, and SQL Server inherited it from the Sybase codebase. The problem is that SQL Server needs to figure out which query will compute the aggregate.

我不会搜索展示如何规避它的答案。

问题是:

  1. 标准的哪一部分不允许或干扰它?
  2. 为什么其他 RDBMS 没有这种外部依赖的问题?
  3. 它们是否扩展了 SQL StandardSQL Server 的行为,或者 SQL Server 没有完全(正确地?)实现它?

我将非常感谢引用:

编辑:

我知道 SQL-92 没有 LATERAL 的概念。但是带有子查询的版本(比如 SQLite/MySQL)也不起作用。

LiveDemo

编辑 2:

为了简化一点,让我们只检查相关的子查询:

SELECT c.Customer, c."User", c.Revenue,
1.0*(SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
FROM t c2
WHERE c."User" = c2."User")
/ (SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

上面的版本在 MySQL/SQLite/Postgresql 中运行良好。

SQL Server 中我们得到错误。在用子查询将其“扁平化”到一个级别后,它可以工作:

SELECT c.Customer, c."User", c.Revenue,
1.0 * (
SELECT SUM(CASE WHEN r1 >= r2 THEN r1 ELSE 0 END)
FROM (SELECT c2.Revenue AS r1, c.Revenue r2
FROM t c2
WHERE c."User" = c2."User") AS S) /
(SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

这个问题的重点是SQL标准是如何规范的。

LiveDemo

最佳答案

有一个更简单的解决方案:

SELECT c.Customer, c."User", c."Revenue",
1.0 * Revenue/ NULLIF(c2.sum_total, 0) AS percentage,
1.0 * c2.sum_running / NULLIF(c2.sum_total, 0) AS running_percentage
FROM t c CROSS APPLY
(SELECT SUM(c2.Revenue) AS sum_total,
SUM(CASE WHEN c2.Revenue >= x.Revenue THEN c2.Revenue ELSE 0 END)
as sum_running
FROM t c2 CROSS JOIN
(SELECT c.REVENUE) x
WHERE c."User" = c2."User"
) c2
ORDER BY "User", Revenue DESC;

我不确定为什么或是否在 SQL '92 标准中有此限制。大约 20 年前我确实记住了它,但我不记得那个特定的限制。

我应该注意:

  • 在 SQL 92 标准出现的时候,横向连接还没有真正引起人们的注意。 Sybase绝对没有这个概念。
  • 其他数据库确实存在外部引用问题。特别是,他们经常将范围界定限制在一层深度。
  • SQL 标准本身倾向于高度政治化(即供应商驱动),而不是由实际的数据库用户需求驱动。好吧,随着时间的推移,它确实朝着正确的方向发展。

关于SQL Server - 具有相关性的条件聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36481732/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com