gpt4 book ai didi

sql - 将多对多关系中的所有相关记录分组,SQL 图形连接组件

转载 作者:行者123 更新时间:2023-12-02 06:35:35 26 4
gpt4 key购买 nike

希望我缺少一个简单的解决方案。

我有两张 table 。其中包含一份公司列表。第二个包含出版商列表。两者之间的映射是多对多的。我想要做的是将表 A 中与表 B 中的发布商有任何关系的所有公司捆绑或分组,反之亦然。

最终结果看起来像这样(GROUPID 是关键字段)。第 1 行和第 2 行属于同一组,因为它们属于同一家公司。第 3 行位于同一组中,因为发布商 Y 已映射到公司 A。第 4 行位于该组中,因为公司 B 已通过发布商 Y 映射到组 1。

简单地说,只要公司和发布商之间存在任何类型的共享关系,就应将该对分配到同一组。

ROW   GROUPID     Company     Publisher
1 1 A Y
2 1 A X
3 1 B Y
4 1 B Z
5 2 C W
6 2 C P
7 2 D W

Fiddle

更新:
我的赏金版本:给定上面 fiddle 中简单的 CompanyPublisher 对的表格,填充上面的 GROUPID 字段。将其视为创建一个包含所有相关 parent / child 的Family ID。

SQL Server 2012

最佳答案

我考虑过使用recursive CTE ,但是,据我所知,在 SQL Server 中不可能使用 UNION 来连接 anchor 成员和递归 CTE 的递归成员(我认为在 PostgreSQL 中可以做到),所以它不是可以消除重复项。

declare @i int

with cte as (
select
GroupID,
row_number() over(order by Company) as rn
from Table1
)
update cte set GroupID = rn

select @i = @@rowcount

-- while some rows updated
while @i > 0
begin
update T1 set
GroupID = T2.GroupID
from Table1 as T1
inner join (
select T2.Company, min(T2.GroupID) as GroupID
from Table1 as T2
group by T2.Company
) as T2 on T2.Company = T1.Company
where T1.GroupID > T2.GroupID

select @i = @@rowcount

update T1 set
GroupID = T2.GroupID
from Table1 as T1
inner join (
select T2.Publisher, min(T2.GroupID) as GroupID
from Table1 as T2
group by T2.Publisher
) as T2 on T2.Publisher = T1.Publisher
where T1.GroupID > T2.GroupID

-- will be > 0 if any rows updated
select @i = @i + @@rowcount
end

;with cte as (
select
GroupID,
dense_rank() over(order by GroupID) as rn
from Table1
)
update cte set GroupID = rn

sql fiddle demo

我还尝试过广度优先搜索算法。我认为它可以更快(就复杂性而言更好),所以我将在这里提供一个解决方案。但我发现它并不比 SQL 方法快:

declare @Company nvarchar(2), @Publisher nvarchar(2), @GroupID int

declare @Queue table (
Company nvarchar(2), Publisher nvarchar(2), ID int identity(1, 1),
primary key(Company, Publisher)
)

select @GroupID = 0

while 1 = 1
begin
select top 1 @Company = Company, @Publisher = Publisher
from Table1
where GroupID is null

if @@rowcount = 0 break

select @GroupID = @GroupID + 1

insert into @Queue(Company, Publisher)
select @Company, @Publisher

while 1 = 1
begin
select top 1 @Company = Company, @Publisher = Publisher
from @Queue
order by ID asc

if @@rowcount = 0 break

update Table1 set
GroupID = @GroupID
where Company = @Company and Publisher = @Publisher

delete from @Queue where Company = @Company and Publisher = @Publisher

;with cte as (
select Company, Publisher from Table1 where Company = @Company and GroupID is null
union all
select Company, Publisher from Table1 where Publisher = @Publisher and GroupID is null
)
insert into @Queue(Company, Publisher)
select distinct c.Company, c.Publisher
from cte as c
where not exists (select * from @Queue as q where q.Company = c.Company and q.Publisher = c.Publisher)
end
end

sql fiddle demo

我已经测试了我的版本和 Gordon Linoff 的版本来检查它的性能。看起来 CTE 更糟糕,我迫不及待地想要完成 1000 多行。

这是sql fiddle demo与随机数据。我的结果是:
128行:
我的 RBAR 解决方案:190ms
我的SQL解决方案:27ms
戈登·利诺夫的解决方案:958ms
256 行:
我的 RBAR 解决方案:560ms
我的SQL解决方案:1226ms
戈登·利诺夫的解决方案:45371ms

这是随机数据,因此结果可能不太一致。我认为索引可以改变时间,但不认为它可以改变整个情况。

版本 - 使用临时表,仅计算 GroupID 而不触及初始表:

declare @i int

-- creating table to gather all possible GroupID for each row
create table #Temp
(
Company varchar(1), Publisher varchar(1), GroupID varchar(1),
primary key (Company, Publisher, GroupID)
)

-- initializing it with data
insert into #Temp (Company, Publisher, GroupID)
select Company, Publisher, Company
from Table1

select @i = @@rowcount

-- while some rows inserted into #Temp
while @i > 0
begin
-- expand #Temp in both directions
;with cte as (
select
T2.Company, T1.Publisher,
T1.GroupID as GroupID1, T2.GroupID as GroupID2
from #Temp as T1
inner join #Temp as T2 on T2.Company = T1.Company
union
select
T1.Company, T2.Publisher,
T1.GroupID as GroupID1, T2.GroupID as GroupID2
from #Temp as T1
inner join #Temp as T2 on T2.Publisher = T1.Publisher
), cte2 as (
select
Company, Publisher,
case when GroupID1 < GroupID2 then GroupID1 else GroupID2 end as GroupID
from cte
)
insert into #Temp
select Company, Publisher, GroupID
from cte2
-- don't insert duplicates
except
select Company, Publisher, GroupID
from #Temp

-- will be > 0 if any row inserted
select @i = @@rowcount
end

select
Company, Publisher,
dense_rank() over(order by min(GroupID)) as GroupID
from #Temp
group by Company, Publisher

=> sql fiddle example

关于sql - 将多对多关系中的所有相关记录分组,SQL 图形连接组件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18618999/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com