gpt4 book ai didi

sql - 删除重复项 - 仅删除除最近日期行之外的所有行

转载 作者:行者123 更新时间:2023-12-03 03:03:04 25 4
gpt4 key购买 nike

Possible Duplicate:
How can I find duplicate entries and delete the oldest ones in SQL?

由于更新工具有问题,我的数据库有几千个重复项。我能够识别具有重复项的项目集合,但只需要删除最旧的条目,而不一定是最低的 ID。测试数据如下所示,正确的行有一个 *

除了最近创建的行之外,应该删除没有重复规则 ID 的重复标题的文章。 (实际 id 列是 GUID,因此我不能假设自动增量)

Id           Article id          Rule Id         Title          Opened Date
-- ---------- ------- ----- -----------
1* 111 5 T1 2013-01-20
2 112 5 T1 2013-07-01
3* 113 6 T2 2013-07-01
4* 114 7 T2 2013-07-02
5 115 8 T3 2012-07-01
6 116 8 T3 2013-01-20
7* 117 8 T3 2013-01-21

表架构:

CREATE TABLE [dbo].[test_ai](
[id] [int] NOT NULL,
[ArticleId] [varchar](50) NOT NULL,
[ruleid] [varchar](50) NULL,
[Title] [nvarchar](max) NULL,
[AuditData_WhenCreated] [datetime] NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)

测试数据插入

insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (1, 111, 5, 'test 1', '2013-01-20')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (2, 112, 5, 'test 1', '2012-07-01')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (3, 113, 6, 'test 2', '2012-07-01')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (4, 114, 7, 'test 2', '2012-07-02')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (5, 115, 8, 'test 3', '2012-07-01')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (6, 116, 8, 'test 3', '2013-01-20')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (7, 117, 8, 'test 3', '2013-01-21')

我当前的查询如下所示

select * from test_ai
where test_ai.id in

-- set 1 - all rows with duplicates
(select f.id
from test_ai as F
WHERE exists (select ruleid, title, count(id)
FROM test_ai
WHERE test_ai.title = F.title
AND test_ai.ruleid = F.ruleid
GROUP BY test_ai.title, test_ai.ruleid
having count(test_ai.id) > 1))
and test_ai.id not in

-- set 2 - includes one row from each set of duplicates
(select min(id)
from test_ai as F
WHERE EXISTS (select ruleid, title, count(id)
from test_ai
WHERE test_ai.title = F.title
AND test_ai.ruleid = F.ruleid
group by test_ai.title, test_ai.ruleid
HAVING count(test_ai.id) > 1)
GROUP BY title, ruleid
)

此 SQL 标识了一些应删除的行(第 2、6、7 行),但它确实按照“打开日期”选择了最旧的文章。 (应该删除第 2,5,6 行)我意识到我没有在语句中指定这一点,但正在努力解决如何添加最后一 block 。如果它导致我需要运行多次才能在存在多个重复项时删除重复项的脚本,那么这不是问题。

实际问题要复杂得多,但如果我能克服这个障碍,我就能再次前进。感谢您的浏览!

最佳答案

在 SQL Server 2005+ 中从集合(或集合中的每个组)删除一行的典型模型是:

;WITH cte AS 
(
SELECT col, rn = ROW_NUMBER() OVER
(PARTITION BY something ORDER BY something)
FROM dbo.base_table
WHERE ...
)
DELETE x WHERE rn = 1;

在您的情况下,这将是:

;WITH cte AS 
(
SELECT id, ruleid, Title, rn = ROW_NUMBER() OVER
(
PARTITION BY ruleid, Title
ORDER BY auditdata_whencreated DESC
)
FROM dbo.test_ai
)
DELETE cte
OUTPUT deleted.id
WHERE rn > 1;

结果:

id
----
2
6
5

关于sql - 删除重复项 - 仅删除除最近日期行之外的所有行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14448929/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com