gpt4 book ai didi

sql - 使用 T-SQL 聚合仅相邻的记录

转载 作者:行者123 更新时间:2023-12-05 00:44:18 26 4
gpt4 key购买 nike

我有(为示例简化)一个包含以下数据的表

Row Start       Finish       ID  Amount
--- --------- ---------- -- ------
1 2008-10-01 2008-10-02 01 10
2 2008-10-02 2008-10-03 02 20
3 2008-10-03 2008-10-04 01 38
4 2008-10-04 2008-10-05 01 23
5 2008-10-05 2008-10-06 03 14
6 2008-10-06 2008-10-07 02 3
7 2008-10-07 2008-10-08 02 8
8 2008-10-08 2008-11-08 03 19

日期代表一段时间,ID 是系统在该期间所处的状态,金额是与该状态相关的值。

我想要做的是聚合具有相同 ID 号的相邻行的 Amounts,但保持相同的整体顺序,以便可以组合连续运行。因此,我想最终得到如下数据:
Row Start       Finish       ID  Amount
--- --------- ---------- -- ------
1 2008-10-01 2008-10-02 01 10
2 2008-10-02 2008-10-03 02 20
3 2008-10-03 2008-10-05 01 61
4 2008-10-05 2008-10-06 03 14
5 2008-10-06 2008-10-08 02 11
6 2008-10-08 2008-11-08 03 19

我正在寻找可以放入 SP 的 T-SQL 解决方案,但是我看不到如何通过简单的查询来做到这一点。我怀疑它可能需要某种迭代,但我不想走这条路。

我想要进行此聚合的原因是该过程的下一步是执行按序列中出现的唯一 ID 分组的 SUM() 和 Count(),以便我的最终数据如下所示:
ID  Counts Total
-- ------ -----
01 2 71
02 2 31
03 2 33

但是,如果我做一个简单的
SELECT COUNT(ID), SUM(Amount) FROM data GROUP BY ID

在原来的 table 上,我得到了类似的东西
ID  Counts Total
-- ------ -----
01 3 71
02 3 31
03 2 33

这不是我想要的。

最佳答案

如果您阅读了 R T Snodgrass 的“Developing Time-Oriented Database Applications in SQL”一书(其 pdf 可从他的网站上的出版物中获得),直到 p165-166 上的图 6.25,您将找到非平凡的 SQL,它可以在当前示例中用于将各种行与相同的 ID 值和连续的时间间隔。

下面的查询开发接近正确,但在最后发现了一个问题,它的来源在第一个 SELECT 语句中。我还没有追查为什么给出了错误的答案。 [如果有人可以在他们的 DBMS 上测试 SQL 并告诉我第一个查询是否在那里正常工作,那将是一个很大的帮助!]

它看起来像:

-- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented
-- Database Applications in SQL"
CREATE TABLE Data
(
Start DATE,
Finish DATE,
ID CHAR(2),
Amount INT
);

INSERT INTO Data VALUES('2008-10-01', '2008-10-02', '01', 10);
INSERT INTO Data VALUES('2008-10-02', '2008-10-03', '02', 20);
INSERT INTO Data VALUES('2008-10-03', '2008-10-04', '01', 38);
INSERT INTO Data VALUES('2008-10-04', '2008-10-05', '01', 23);
INSERT INTO Data VALUES('2008-10-05', '2008-10-06', '03', 14);
INSERT INTO Data VALUES('2008-10-06', '2008-10-07', '02', 3);
INSERT INTO Data VALUES('2008-10-07', '2008-10-08', '02', 8);
INSERT INTO Data VALUES('2008-10-08', '2008-11-08', '03', 19);

SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS (SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS (SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS (SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)));

该查询的输出是:
01  2008-10-01      2008-10-02
01 2008-10-03 2008-10-05
02 2008-10-02 2008-10-03
02 2008-10-06 2008-10-08
03 2008-10-05 2008-10-06
03 2008-10-05 2008-11-08
03 2008-10-08 2008-11-08

已编辑 :倒数第二行有问题 - 它不应该在那里。我还不清楚(还)它来自哪里。

现在,我们需要将该复杂表达式视为另一个 SELECT 语句的 FROM 子句中的查询表达式,它将对与上述最大范围重叠的条目的给定 ID 的金额值求和。
SELECT M.ID, M.Start, M.Finish, SUM(D.Amount)
FROM Data AS D,
(SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS (SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS (SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS (SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)))) AS M
WHERE D.ID = M.ID
AND M.Start <= D.Start
AND M.Finish >= D.Finish
GROUP BY M.ID, M.Start, M.Finish
ORDER BY M.ID, M.Start;

这给出:
ID  Start        Finish       Amount
01 2008-10-01 2008-10-02 10
01 2008-10-03 2008-10-05 61
02 2008-10-02 2008-10-03 20
02 2008-10-06 2008-10-08 11
03 2008-10-05 2008-10-06 14
03 2008-10-05 2008-11-08 33 -- Here be trouble!
03 2008-10-08 2008-11-08 19

已编辑 :这几乎是原始问题要求的 COUNT 和 SUM 聚合的正确数据集,因此最终答案是:
SELECT I.ID, COUNT(*) AS Number, SUM(I.Amount) AS Amount
FROM (SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) AS Amount
FROM Data AS D,
(SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS
(SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS
(SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS
(SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR
(T2.Start <= L.Finish AND L.Finish < T2.Finish)))
) AS M
WHERE D.ID = M.ID
AND M.Start <= D.Start
AND M.Finish >= D.Finish
GROUP BY M.ID, M.Start, M.Finish
) AS I
GROUP BY I.ID
ORDER BY I.ID;

id number amount
01 2 71
02 2 31
03 3 66

评论 :
哦! Drat... 3 的条目是它应该具有的“数量”的两倍。以前的“编辑”部分表明事情开始出错的地方。看起来好像第一个查询有点错误(也许它是针对不同的问题),或者我正在使用的优化器行为不端。尽管如此,应该有一个与此密切相关的答案,该答案将给出正确的值。

记录:在 Solaris 10 上的 IBM Informix Dynamic Server 11.50 上进行了测试。但是,应该可以在任何其他中等标准的 SQL DBMS 上正常工作。

关于sql - 使用 T-SQL 聚合仅相邻的记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/235418/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com