gpt4 book ai didi

sql - 如何在 SQL Server 中对相似的行进行分组

转载 作者:行者123 更新时间:2023-12-03 02:53:48 24 4
gpt4 key购买 nike

我有一个这样的表:

Date        ConfigID    ItemID    ClientName    Metric1    Metric2
==== ======== ====== ========== ======= =======
2017-01-01 1 1 A 2.0 2.0
2017-01-01 3 1 A 2.0 2.0
2017-01-01 4 2 B 5.0 5.0
2017-01-02 4 3 A 6.0 6.0
2017-01-01 2 1 A 2.0 2.0
....
(20 million rows here)

我想根据 Date 检测此处的重复项, ItemID , ClientName , Metric1Metric2 ,所以我写道:

CREATE TABLE MyTable ([Date] date,
ConfigID int,
ItemID int,
ClientName char(1),
Metric1 decimal(3,1),
Metric2 decimal(3,1));
INSERT INTO MyTable
VALUES ('2017-01-01',1,1,'A',2.0,2.0),
('2017-01-01',3,1,'A',2.0,2.0),
('2017-01-01',4,2,'B',5.0,5.0),
('2017-01-02',4,3,'A',6.0,6.0),
('2017-01-01',2,1,'A',2.0,2.0);

WITH Dupes
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY
[Date]
,[ItemID]
,[ClientName]
,[Metric1]
,[Metric2]
ORDER BY [Date] DESC
) AS RowNum
FROM myTable)

SELECT *
FROM Dupes

但是返回的结果是这样的:

Date        ConfigID    ItemID    ClientName    Metric1    Metric2    RowNum
==== ======== ====== ========== ======= ======= ======
2017-01-01 1 1 A 2.0 2.0 1
2017-01-01 3 1 A 2.0 2.0 2
2017-01-01 4 2 B 5.0 5.0 1
2017-01-02 4 3 A 6.0 6.0 1
2017-01-01 2 1 A 2.0 2.0 3
....
(20 million rows here)

我想根据PARTITION BY对相似的项目进行分组条款。换句话说,我希望看到这样的东西(我真的不需要 RowNum ):

Date        ConfigID    ItemID    ClientName    Metric1    Metric2    RowNum
==== ======== ====== ========== ======= ======= ======
2017-01-01 1 1 A 2.0 2.0 1
2017-01-01 3 1 A 2.0 2.0 2
2017-01-01 2 1 A 2.0 2.0 3
2017-01-01 4 2 B 5.0 5.0 1
2017-01-02 4 3 A 6.0 6.0 1
....
(20 million rows here)

什么 SQL 查询可以帮助我对表中的重复/相似行进行分组?提前感谢您的建议和解答!

最佳答案

使用 DENSE_RANK 而不是 ROW_NUMBER 会有帮助吗?

;           
WITH Dupes
AS (
SELECT *
,DENSE_RANK ( )
OVER (
ORDER BY
[Date]
,[ItemID]
,[ClientName]
,[Metric1]
,[Metric2]
DESC
) AS GroupID
FROM myTable)

SELECT *
FROM Dupes

这里提出的解决方案:

;           
WITH D1
AS (
SELECT *
,DENSE_RANK ( )
OVER (
ORDER BY
[Date]
,[ItemID]
,[ClientName]
,[Metric1]
,[Metric2]
DESC
) AS GroupID
FROM myTable)
, Dupes AS (
SELECT *
, COUNT(*) OVER (PARTITION BY GroupID) AS GroupItemsCount
FROM D1
)
SELECT *
FROM Dupes
WHERE GroupItemsCount <> 1

但更好的方法可能是

;           
WITH Dupes
AS (
SELECT *
,COUNT(*)
OVER (
partition BY
[Date]
,[ItemID]
,[ClientName]
,[Metric1]
,[Metric2]
) AS GroupItemsCount
FROM myTable)

SELECT *
FROM Dupes
WHERE GroupItemsCount > 1

关于sql - 如何在 SQL Server 中对相似的行进行分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47797937/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com