gpt4 book ai didi

sql-server - 使用 EXCEPT 组合数据集与在 LEFT JOIN 中检查 IS NULL

转载 作者:行者123 更新时间:2023-12-02 20:29:58 25 4
gpt4 key购买 nike

我目前正在努力通过Microsoft SQL Server 2008 - 数据库开发(MCTS 考试 70-433) 认证。在前面关于组合数据集的章节中,我遇到了EXCEPT(和INTERSECT)命令。一个示例展示了如何使用 EXCEPT 从一个表中获取在第二个表中没有相关值的所有值,如下所示:

SELECT EmployeeKey FROM DimEmployee
EXCEPT
SELECT EmployeeKey FROM FactResellerSales

EXCEPT 命令对我来说是新的,但根据我今天之前所知道的,我仍然可以使用 LEFT JOIN 轻松解决问题并检查 IS通过以下方式对连接约束设置 NULL:

SELECT DISTINCT DimEmployee.EmployeeKey FROM DimEmployee
LEFT JOIN FactResellerSales ON FactResellerSales.EmployeeKey = DimEmployee.EmployeeKey
WHERE FactResellerSales.EmployeeKey IS NULL

现在我开始想知道其中哪一个具有最好的性能。我尝试研究查询执行计划,但我不太擅长阅读这些计划,所以它并没有让我变得更聪明。对于使用 EXCEPT 的查询,计划如下所示:

|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey], [Expr1006]) WITH UNORDERED PREFETCH)
|--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[IX_DimEmployee_SalesTerritoryKey]))
|--Top(TOP EXPRESSION:((1)))
|--Index Seek(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), SEEK:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]) ORDERED FORWARD)

对于使用LEFT JOIN的人来说,它看起来像这样:

|--Stream Aggregate(GROUP BY:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
|--Filter(WHERE:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey] IS NULL))
|--Merge Join(Left Outer Join, MERGE:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey])=([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]), RESIDUAL:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
|--Clustered Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[PK_DimEmployee_EmployeeKey]), ORDERED FORWARD)
|--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), ORDERED FORWARD)

查询中使用的表来自 AdventureWorksDW2008 示例数据库,因此下面我还包含了这两个表的创建脚本,以防需要正确回答问题:

USE [AdventureWorksDW2008]
GO

/****** Object: Table [dbo].[DimEmployee] Script Date: 11/22/2010 20:30:20 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[DimEmployee](
[EmployeeKey] [int] IDENTITY(1,1) NOT NULL,
[ParentEmployeeKey] [int] NULL,
[EmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
[ParentEmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
[SalesTerritoryKey] [int] NULL,
[FirstName] [nvarchar](50) NOT NULL,
[LastName] [nvarchar](50) NOT NULL,
[MiddleName] [nvarchar](50) NULL,
[NameStyle] [bit] NOT NULL,
[Title] [nvarchar](50) NULL,
[HireDate] [date] NULL,
[BirthDate] [date] NULL,
[LoginID] [nvarchar](256) NULL,
[EmailAddress] [nvarchar](50) NULL,
[Phone] [nvarchar](25) NULL,
[MaritalStatus] [nchar](1) NULL,
[EmergencyContactName] [nvarchar](50) NULL,
[EmergencyContactPhone] [nvarchar](25) NULL,
[SalariedFlag] [bit] NULL,
[Gender] [nchar](1) NULL,
[PayFrequency] [tinyint] NULL,
[BaseRate] [money] NULL,
[VacationHours] [smallint] NULL,
[SickLeaveHours] [smallint] NULL,
[CurrentFlag] [bit] NOT NULL,
[SalesPersonFlag] [bit] NOT NULL,
[DepartmentName] [nvarchar](50) NULL,
[StartDate] [date] NULL,
[EndDate] [date] NULL,
[Status] [nvarchar](50) NULL,
CONSTRAINT [PK_DimEmployee_EmployeeKey] PRIMARY KEY CLUSTERED
(
[EmployeeKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[DimEmployee] WITH CHECK ADD CONSTRAINT [FK_DimEmployee_DimEmployee] FOREIGN KEY([ParentEmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO

ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimEmployee]
GO

ALTER TABLE [dbo].[DimEmployee] WITH CHECK ADD CONSTRAINT [FK_DimEmployee_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO

ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimSalesTerritory]
GO

第二个表:

USE [AdventureWorksDW2008]
GO

/****** Object: Table [dbo].[FactResellerSales] Script Date: 11/22/2010 20:30:47 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[FactResellerSales](
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[DueDateKey] [int] NOT NULL,
[ShipDateKey] [int] NOT NULL,
[ResellerKey] [int] NOT NULL,
[EmployeeKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[SalesTerritoryKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[RevisionNumber] [tinyint] NULL,
[OrderQuantity] [smallint] NULL,
[UnitPrice] [money] NULL,
[ExtendedAmount] [money] NULL,
[UnitPriceDiscountPct] [float] NULL,
[DiscountAmount] [float] NULL,
[ProductStandardCost] [money] NULL,
[TotalProductCost] [money] NULL,
[SalesAmount] [money] NULL,
[TaxAmt] [money] NULL,
[Freight] [money] NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[CustomerPONumber] [nvarchar](25) NULL,
CONSTRAINT [PK_FactResellerSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED
(
[SalesOrderNumber] ASC,
[SalesOrderLineNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimCurrency] FOREIGN KEY([CurrencyKey])
REFERENCES [dbo].[DimCurrency] ([CurrencyKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimCurrency]
GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate] FOREIGN KEY([OrderDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate]
GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate1] FOREIGN KEY([DueDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate1]
GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate2] FOREIGN KEY([ShipDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate2]
GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimEmployee] FOREIGN KEY([EmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimEmployee]
GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimProduct] FOREIGN KEY([ProductKey])
REFERENCES [dbo].[DimProduct] ([ProductKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimProduct]
GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimPromotion] FOREIGN KEY([PromotionKey])
REFERENCES [dbo].[DimPromotion] ([PromotionKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimPromotion]
GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimReseller] FOREIGN KEY([ResellerKey])
REFERENCES [dbo].[DimReseller] ([ResellerKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimReseller]
GO

ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimSalesTerritory]
GO

这个问题使用了一个非常具体的示例,但我也对有关何时/是否适合使用 EXCEPT 而不是 LEFT JOIN 的一般信息感兴趣> 并检查IS NULL

我还注意到,如果我尝试使用 INTERCEPT 运行第一个查询,则第二个查询中的等效项将使用标准 JOINSELECT DISTINCT DimEmployee.EmployeeKey (并且根本没有 WHERE 子句)。然而,在本例中,两种情况的执行计划完全相同。

更新
对第二个查询进行了小幅更新(请参阅修订历史记录),这导致该查询计划变得更加复杂。我猜想更大的查询计划表明查询不太理想,但我仍然希望回答这个问题。

最佳答案

在 LEFT JOIN 给出多行的情况下,这 2 个查询将会不同。也就是说,FactResellerSales 是 DimEmployee 的子级,DimEmployee 中每行有许多行。因此,正如您在 JOIN 示例中指出的那样,您需要 DISTINCT。

如果您将查询更改为使用 NOT EXISTS,您会得到相同的计划(左反半连接是 NOT EXISTS 的典型情况)

SELECT EmployeeKey
FROM DimEmployee DE
WHERE
NOT EXISTS (SELECT * FROM
FactResellerSales FRS
WHERE FRS.EmployeeKey = DE.EmployeeKey)

此外,出于同样的原因,INTERSECT/EXISTS 很可能会给出相同的计划。

这是JOIN/EXISTS/INOUTER JOIN/NOT EXISTS/NOT IN 争论的另一个方面。 INTERSECT/EXCEPT 是 (NOT) EXISTS 的稍微优雅的构造,您喜欢吗

编辑:

没有明显的问题...

就我个人而言,我不使用 OUTER JOIN 进行“存在”检查:我使用 EXISTS 或 NOT EXISTS(或者 INTERSECT/EXCEPT,如果我记得的话),因为它更明显你想要做什么。也就是说,如果我不需要“外部”表中的行,我不会使用它来避免出现 DISTINCT。

在这种情况下,没有使用 OUTER JOIN/IS NULL 的情况if。当然,我在需要时使用 OUTER JOIN:这个答案仅针对一种特定情况。

关于sql-server - 使用 EXCEPT 组合数据集与在 LEFT JOIN 中检查 IS NULL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4249891/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com