gpt4 book ai didi

sql - SELECT ID 存在于第二个表中的行的最有效方法

转载 作者:行者123 更新时间:2023-12-03 02:08:14 26 4
gpt4 key购买 nike

我希望从一个表中选择 ID 存在于第二个表中的所有记录。

以下两个查询返回正确的结果:

查询 1:

SELECT *
FROM Table1 t1
WHERE EXISTS (SELECT 1 FROM Table2 t2 WHERE t1.ID = t2.ID)

查询 2:

SELECT *
FROM Table1 t1
WHERE t1.ID IN (SELECT t2.ID FROM Table2 t2)

这些查询之一是否比另一个更有效?我应该使用其中一种而不是另一种吗?还有第三种我没有想到的更有效的方法吗?

最佳答案

摘要:

IN and EXISTS performed similarly in all scenarios.. Below are the parameters used to validate..

Execution cost,Time:
Same for both and optimizer produced same plan.
Memory Grant:
Same for both queries
Cpu Time,Logical reads :
Exists seems to outperform IN by little bit margin in terms of CPU Time,though reads are same..

我使用以下测试数据集运行每个查询 10 次。

  1. 非常大的子查询结果集(100000 行)
  2. 重复行
  3. 空行

对于上述所有场景,INEXISTS 都以相同的方式执行。

有关 Performance V3 database 的一些信息用于测试。20000 个客户有 1000000 个订单,因此每个客户在订单表中随机重复(范围为 10 到 100)。

执行成本、时间:
下面是两个查询运行的屏幕截图。观察每个查询的相对成本。

enter image description here

内存消耗:
两个查询的内存授予也是相同的..我强制 MDOP 1 以免将它们溢出到 TEMPDB..

enter image description here

CPU 时间,读取:

对于存在:

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 469 ms, elapsed time = 595 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

对于 IN:

(20000 row(s) affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 1, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 3855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 547 ms, elapsed time = 669 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

在每种情况下,优化器都足够聪明,可以重新排列查询。

我倾向于仅使用EXISTS(我的意见)。使用 EXISTS 的一个用例是当您不想返回第二个表结果集时。

根据 Martin Smith 的查询进行更新:

我运行了以下查询,以找到从第一个表中获取第二个表中存在引用的行的最有效方法。

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid

SELECT *
FROM Customers C
WHERE EXISTS(SELECT 1 FROM Orders o WHERE o.custid = c.custid)

SELECT *
FROM Customers c
WHERE custid IN (SELECT custid FROM Orders)

除了第二个 INNER JOIN 之外,上述所有查询都具有相同的成本,其余查询的计划相同。

enter image description here

内存授予:
此查询

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid

所需的内存授予

enter image description here

此查询

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid

所需的内存授予..

enter image description here

CPU 时间,读取:
查询:

SELECT DISTINCT c.*
FROM Customers c
JOIN Orders o ON o.custid = c.custid

(20000 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 48, logical reads 1344, physical reads 96, read-ahead reads 1248, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1453 ms, elapsed time = 781 ms.

查询:

SELECT c.*
FROM Customers c
INNER JOIN (SELECT DISTINCT custid FROM Orders) AS o ON o.custid = c.custid

(20000 row(s) affected)
Table 'Customers'. Scan count 5, logical reads 322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 5, logical reads 3929, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1499 ms, elapsed time = 403 ms.

关于sql - SELECT ID 存在于第二个表中的行的最有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38444729/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com