gpt4 book ai didi

sql - 查找生成林(带递归,PostgreSQL 9.5)

转载 作者:行者123 更新时间:2023-11-29 11:36:51 24 4
gpt4 key购买 nike

我有一个表,其中包含任意数量的人的 identities(即别名)。每行都有一个以前的名称和一个新名称。在生产中,大约有 100 万行。例如:

id, old, new
---
1, 'Albert', 'Bob'
2, 'Bob', 'Charles'
3, 'Mary', 'Nancy'
4, 'Charles', 'Albert'
5, 'Lydia', 'Nancy'
6, 'Zoe', 'Zoe'

我想要的是生成用户 列表并引用他们各自的所有身份。这类似于在每个连接身份图中查找所有节点,或查找生成森林:

User 1: Albert, Bob, Charles (identities: 1,2,4)
User 2: Mary, Nancy, Lydia (identities: 3,5)
User 3: Zoe (identities: 6)

我一直在修补 PostgreSQL 的 WITH RECURSIVE,但它会为每个生成集合和子集。例如:

1,2,4 <-- spanning tree: good
2 <-- subset: discard
3,5 <-- spanning tree: good
4 <-- subset: discard
5 <-- subset: discard
6 <-- spanning tree: good

我需要做什么才能只为每个用户生成完整的身份集(即生成树)?

SQLFiddle:http://sqlfiddle.com/#!15/9eaed/4我最近的尝试。这是代码:

WITH RECURSIVE search_graph AS (
SELECT id
, id AS min_id
, ARRAY[id] AS path
, ARRAY[old,new] AS emails
FROM identities

UNION

SELECT identities.id
, LEAST(identities.id, sg.min_id)
, (sg.path || identities.id)
, (sg.emails || identities.old || identities.new)

FROM search_graph sg
JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
WHERE identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph, unnest(path) p GROUP BY min_id;

结果:

1,2,4
2
3,5
4
5
6

最佳答案

我前段时间写过类似问题的答案:How to find all connected subgraphs of an undirected graph .在那个问题中,我使用了 SQL Server。有关中间 CTE 的详细说明,请参阅该答案。我将该查询改编为 Postgres。

使用 Postgres 数组功能可以更有效地编写它,而不是将路径连接到 text 列中。

WITH RECURSIVE
CTE_Idents
AS
(
SELECT old AS Ident
FROM identities

UNION

SELECT new AS Ident
FROM identities
)
,CTE_Pairs
AS
(
SELECT old AS Ident1, new AS Ident2
FROM identities
WHERE old <> new

UNION

SELECT new AS Ident1, old AS Ident2
FROM identities
WHERE old <> new
)
,CTE_Recursive
AS
(
SELECT
CTE_Idents.Ident AS AnchorIdent
, Ident1
, Ident2
, ',' || Ident1 || ',' || Ident2 || ',' AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

UNION ALL

SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CTE_Recursive.IdentPath || CTE_Pairs.Ident2 || ',' AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE ('%,' || CTE_Pairs.Ident2 || ',%')
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult

UNION

SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
,CTE_Groups
AS
(
SELECT
CTE_Idents.Ident
,array_agg(COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)
ORDER BY COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)) AS AllIdents
FROM
CTE_Idents
LEFT JOIN CTE_CleanResult ON CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
GROUP BY CTE_Idents.Ident
)
SELECT AllIdents
FROM CTE_Groups
GROUP BY AllIdents
;

我向您的示例数据添加了一行 (7,X,Y)

SQL Fiddle

结果

|          allidents |
|--------------------|
| Lydia,Mary,Nancy |
| Albert,Bob,Charles |
| X,Y |
| Zoe |

关于sql - 查找生成林(带递归,PostgreSQL 9.5),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36685015/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com