gpt4 book ai didi

postgresql - 如何优化 SQL 查询以检查表中列值的一致性

转载 作者:行者123 更新时间:2023-11-29 12:11:08 25 4
gpt4 key购买 nike

我想跨多个表检查每个表中是否存在相同的键/相同数量的键。

目前我已经创建了一个解决方案来检查每个单独表的键数,当所有表合并在一起时检查键数,然后进行比较。

这个解决方案可行,但我想知道是否有更优化的解决方案...

目前的示例解决方案:

SELECT COUNT(DISTINCT variable) AS num_ids FROM table_a;

SELECT COUNT(DISTINCT variable) AS num_ids FROM table_b;

SELECT COUNT(DISTINCT variable) AS num_ids FROM table_c;

SELECT COUNT(DISTINCT a.variable) AS num_ids
FROM (SELECT DISTINCT VARIABLE FROM table_a) a
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_b) b ON a.variable = b.variable
INNER JOIN (SELECT DISTINCT VARIABLE FROM table_c) c ON a.variable = c.variable;

更新:

我在一个查询中面临的困难是任何表在我要检查的 VARIABLE 上可能不是唯一的,所以我必须在合并之前使用 distinct 以避免扩展加入

最佳答案

因为我们只是在计数,所以我认为没有必要在 variable 列上加入表格。 UNION 应该足够了。我们仍然必须使用 DISTINCT 来忽略/抑制重复项,这通常意味着额外的排序。variable 上的索引应该有助于获取单独表的计数,但它无助于获取组合表的计数。

下面是一个比较两个表的例子:

WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_AB
AS
(
SELECT COUNT(DISTINCT variable) AS CountAB
FROM
(
SELECT variable
FROM TableA

UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`

SELECT variable
FROM TableB
) AS AB
)
SELECT
CASE WHEN CountA = CountAB AND CountB = CountAB
THEN 'same' ELSE 'different' END AS ResultAB
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_AB
;

三个表:

WITH
CTE_A
AS
(
SELECT COUNT(DISTINCT variable) AS CountA
FROM TableA
)
,CTE_B
AS
(
SELECT COUNT(DISTINCT variable) AS CountB
FROM TableB
)
,CTE_C
AS
(
SELECT COUNT(DISTINCT variable) AS CountC
FROM TableC
)
,CTE_ABC
AS
(
SELECT COUNT(DISTINCT variable) AS CountABC
FROM
(
SELECT variable
FROM TableA

UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`

SELECT variable
FROM TableB

UNION ALL
-- sic! use ALL here to avoid sort when merging two tables
-- there should be only one distinct sort for the outer `COUNT`

SELECT variable
FROM TableC
) AS AB
)
SELECT
CASE WHEN CountA = CountABC AND CountB = CountABC AND CountC = CountABC
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
CROSS JOIN CTE_ABC
;

我特意选择了 CTE,因为据我所知,Postgres 实现了 CTE,而在我们的例子中,每个 CTE 都只有一行。


使用 array_agg with order by 是更好的变体,如果它在 redshift 上可用的话。您仍然需要使用 DISTINCT,但不必将所有表合并在一起。

WITH
CTE_A
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS A
FROM TableA
)
,CTE_B
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS B
FROM TableB
)
,CTE_C
AS
(
SELECT array_agg(DISTINCT variable ORDER BY variable) AS C
FROM TableC
)
SELECT
CASE WHEN A = B AND B = C
THEN 'same' ELSE 'different' END AS ResultABC
FROM
CTE_A
CROSS JOIN CTE_B
CROSS JOIN CTE_C
;

关于postgresql - 如何优化 SQL 查询以检查表中列值的一致性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34452274/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com