gpt4 book ai didi

sql - 跨多个可能的 ID 匹配记录

转载 作者:行者123 更新时间:2023-12-02 16:42:44 27 4
gpt4 key购买 nike

我有多个记录,且标识符稀疏(我将称这些记录为 ID 号)。我每条记录最多可以有两个不同的 ID 号,并且希望能够一起遍历所有相关记录,以便我可以创建一个共享标识符。我想在 T-SQL 查询中实现这一点。

本质上,这里是一些示例数据:

+-------+-------+--------+-----+------+
| RowId | ID1 | ID2 | ID3 | ID4 |
+-------+-------+--------+-----+------+
| 1 | 11111 | | | |
| 2 | 11111 | | | |
| 3 | 11111 | AAAAA | | |
| 4 | | BBBBBB | BC1 | |
| 5 | | | BC1 | O111 |
| 6 | | GGGGG | BC1 | |
| 7 | | AAAAA | | O111 |
| 8 | | CCCCCC | | |
| 9 | 99999 | | | |
| 10 | 99999 | DDDDDD | | |
| 11 | | | | O222 |
| 12 | | EEEEEE | | O222 |
| 13 | | EEEEEE | | O333 |
+-------+-------+--------+-----+------+

例如,11111 链接到 RowId3 中的 AAAAA,AAAAA 也链接到 rowId 7 中的 O111。O111 链接到 RowId 5 中的 BC1。BC1 链接到 RowId 4 中的 BBBBBB,ETC。还,一旦所有这些行都链接起来,我想创建一个新的单个标识符。

这是我想要为上述所有数据实现的输出:

Denormalised:
+---------+-------+--------+-----+------+
| GroupId | ID1 | ID2 | ID3 | ID4 |
+---------+-------+--------+-----+------+
| 1 | 11111 | AAAAA | BC1 | O111 |
| 1 | 11111 | BBBBBB | BC1 | O111 |
| 1 | 11111 | GGGGG | BC1 | O111 |
| 2 | | CCCCCC | | |
| 3 | 99999 | DDDDDD | | |
| 4 | | EEEEEE | | O222 |
| 4 | | EEEEEE | | O333 |
+---------+-------+--------+-----+------+


Normalized (probably better to work with):

+--------+----------+---------+
| IDType | IDNumber | GroupId |
+--------+----------+---------+
| ID1 | 11111 | 1 |
| ID2 | AAAAA | 1 |
| ID2 | BBBBBB | 1 |
| ID2 | GGGGG | 1 |
| ID3 | BC1 | 1 |
| ID4 | O111 | 1 |
| ID2 | CCCCCC | 2 |
| ID1 | 99999 | 3 |
| ID2 | DDDDDD | 3 |
| ID2 | EEEEEE | 4 |
| ID4 | O222 | 4 |
| ID4 | O333 | 4 |
+--------+----------+---------+


我正在寻找 SQL 代码来生成上面的输出或类似的标准化结构。谢谢。

编辑:以下是一些用于创建与上表中的示例数据相匹配的数据的代码。

DROP TABLE IF EXISTS #ID
CREATE TABLE #ID
(
RowId INT,
ID1 VARCHAR(100),
ID2 VARCHAR(100),
ID3 VARCHAR(100),
ID4 VARCHAR(100)
)

INSERT INTO #ID VALUES
(1,'11111',NULL,NULL,NULL),
(2,'11111',NULL,NULL,NULL),
(3,'11111','AAAAA',NULL,NULL),
(4,NULL,'BBBBBB','BC1',NULL),
(5,NULL,NULL,'BC1','O111'),
(6,NULL,'GGGGG','BC1',NULL),
(7,NULL,'AAAAA',NULL,'O111'),
(8,NULL,'CCCCCC',NULL,NULL),
(9,'99999',NULL,NULL,NULL),
(10,'99999','DDDDDD',NULL,NULL),
(11,NULL,NULL,NULL,'O222'),
(12,NULL,'EEEEEE',NULL,'O222'),
(13,NULL,'EEEEEE',NULL,'O333')

最佳答案

很容易获得标准化输出。

我正在使用来自 How to find all connected subgraphs of an undirected graph 的查询只需稍加修改即可将数据转换为定义图形边缘的对。该查询将数据视为图中的边,并递归遍历图的所有边,并在检测到循环时停止。然后它将所有找到的循环分组并为每个组指定一个编号。

你的源表有四个ID,但是每一行只能有两个ID,所以我们知道每一行都有一对ID。我的查询需要这种数据(ID 对)。将四个 ID 转换为一对很容易 - 使用 COALESCE

有关其工作原理的详细说明,请参阅 How to find all connected subgraphs of an undirected graph .

查询

WITH
CTE_Idents
AS
(
SELECT ID1 AS Ident, 'ID1' AS IDType
FROM @T

UNION

SELECT ID2 AS Ident, 'ID2' AS IDType
FROM @T

UNION

SELECT ID3 AS Ident, 'ID3' AS IDType
FROM @T

UNION

SELECT ID4 AS Ident, 'ID4' AS IDType
FROM @T
)
,CTE_Pairs
AS
(
SELECT COALESCE(ID1, ID2, ID3, ID4) AS Ident1, COALESCE(ID4, ID3, ID2, ID1) AS Ident2
FROM @T

UNION

SELECT COALESCE(ID4, ID3, ID2, ID1) AS Ident1, COALESCE(ID1, ID2, ID3, ID4) AS Ident2
FROM @T
)
,CTE_Recursive
AS
(
SELECT
CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent
, Ident1
, Ident2
, CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

UNION ALL

SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult

UNION

SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
SELECT
CTE_Idents.IDType
,CTE_Idents.Ident
,CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
,DENSE_RANK() OVER(ORDER BY
CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
) AS GroupID
FROM
CTE_Idents
CROSS APPLY
(
SELECT CTE_CleanResult.Ident+','
FROM CTE_CleanResult
WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
) AS CA_XML(XML_Value)
CROSS APPLY
(
SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
) AS CA_Data(XML_Value)
WHERE
CTE_Idents.Ident IS NOT NULL
ORDER BY GroupID, IDType, Ident;

结果

+--------+--------+------------------------------------+---------+
| IDType | Ident | GroupMembers | GroupID |
+--------+--------+------------------------------------+---------+
| ID1 | 11111 | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID2 | AAAAA | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID2 | BBBBBB | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID2 | GGGGG | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID3 | BC1 | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID4 | O111 | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, | 1 |
| ID1 | 99999 | 99999,DDDDDD, | 2 |
| ID2 | DDDDDD | 99999,DDDDDD, | 2 |
| ID2 | CCCCCC | CCCCCC, | 3 |
| ID2 | EEEEEE | EEEEEE,O222,O333, | 4 |
| ID4 | O222 | EEEEEE,O222,O333, | 4 |
| ID4 | O333 | EEEEEE,O222,O333, | 4 |
+--------+--------+------------------------------------+---------+

这就是您的数据的图表:

graph

我使用 https://www.graphviz.org/ 中的 DOT 渲染了此图像。

<小时/>

如何将此标准化输出转换为非标准化输出?一种方法是使用 IDType 的帮助对其进行逆透视,但如果图形可以有多个循环,则可能会变得很棘手。您最好专门问另一个有关将规范化数据集转换为非规范化数据集的问题。

关于sql - 跨多个可能的 ID 匹配记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57338886/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com