gpt4 book ai didi

sql - 为什么窗口聚合函数的逻辑读取如此之高?

转载 作者:行者123 更新时间:2023-12-02 00:53:15 25 4
gpt4 key购买 nike

我发现,在使用公共(public)子表达式假脱机的执行计划中,对于大型表,报告的逻辑读取会变得相当高。

经过一番尝试和错误,我发现了一个似乎适用于下面的测试脚本和执行计划的公式。 工作表逻辑读取 = 1 + NumberOfRows * 2 + NumberOfGroups * 4

我不明白这个公式成立的原因。这比我认为有必要查看该计划。谁能详细说明一下到底发生了什么?

或者如果失败了,有什么方法可以跟踪每次逻辑读取中读取的页面,以便我可以自己解决?

SET STATISTICS IO OFF; SET NOCOUNT ON;

IF Object_id('tempdb..#Orders') IS NOT NULL
DROP TABLE #Orders;

CREATE TABLE #Orders
(
OrderID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,
CustomerID NCHAR(5) NULL,
Freight MONEY NULL,
);

CREATE NONCLUSTERED INDEX ix
ON #Orders (CustomerID)
INCLUDE (Freight);

INSERT INTO #Orders
VALUES (N'ALFKI', 29.46),
(N'ALFKI', 61.02),
(N'ALFKI', 23.94),
(N'ANATR', 39.92),
(N'ANTON', 22.00);

SELECT PredictedWorktableLogicalReads =
1 + 2 * Count(*) + 4 * Count(DISTINCT CustomerID)
FROM #Orders;

SET STATISTICS IO ON;

SELECT OrderID,
Freight,
Avg(Freight) OVER (PARTITION BY CustomerID) AS Avg_Freight
FROM #Orders;

输出

PredictedWorktableLogicalReads
------------------------------
23
<小时/>
Table 'Worktable'. Scan count 3, logical reads 23, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Orders___________000000000002'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Execution Plan

其他信息:

Query Tuning and Optimization 的第 3 章对这些线轴有很好的解释。预订并 this blog post by Paul White .

总之,计划顶部的段迭代器向它发送的行添加一个标志,指示何时开始新分区。主段假脱机一次从段迭代器获取一行并将其插入到 tempdb 中的工作表中。一旦它获得表明新组已启动的标志,它就会向嵌套循环运算符的顶部输入返回一行。这会导致在工作表中的行上调用流聚合,计算平均值,然后在工作表被截断以准备新组之前将该值与工作表中的行连接回来。段假脱机发出一个虚拟行,以便处理最终组。

据我了解,工作表是一个堆(或者在计划中将其表示为索引假脱机)。然而,当我尝试复制相同的进程时,它只需要 11 次逻辑读取。

CREATE TABLE #WorkTable
(
OrderID INT,
CustomerID NCHAR(5) NULL,
Freight MONEY NULL,
)

DECLARE @Average MONEY

PRINT 'Insert 3 Rows'

INSERT INTO #WorkTable
VALUES (1, N'ALFKI', 29.46) /*Scan count 0, logical reads 1*/

INSERT INTO #WorkTable
VALUES (2, N'ALFKI', 61.02) /*Scan count 0, logical reads 1*/

INSERT INTO #WorkTable
VALUES (3, N'ALFKI', 23.94) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'

SELECT @Average = Avg(Freight)
FROM #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'

/*This convoluted query is just to force a nested loops plan*/
SELECT *
FROM (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
OUTER APPLY #WorkTable
WHERE COALESCE(Freight, OrderID) IS NOT NULL
AND @Average IS NOT NULL

PRINT 'Clear out work table'

TRUNCATE TABLE #WorkTable

PRINT 'Insert 1 Row'

INSERT INTO #WorkTable
VALUES (4, N'ANATR', 39.92) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'

SELECT @Average = Avg(Freight)
FROM #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'

SELECT *
FROM (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
OUTER APPLY #WorkTable
WHERE COALESCE(Freight, OrderID) IS NOT NULL
AND @Average IS NOT NULL

PRINT 'Clear out work table'

TRUNCATE TABLE #WorkTable

PRINT 'Insert 1 Row'

INSERT INTO #WorkTable
VALUES (5, N'ANTON', 22.00) /*Scan count 0, logical reads 1*/
PRINT 'Calculate AVG'

SELECT @Average = Avg(Freight)
FROM #WorkTable /*Scan count 1, logical reads 1*/
PRINT 'Return Rows - With the average column included'

SELECT *
FROM (SELECT @Average AS Avg_Freight) T /*Scan count 1, logical reads 1*/
OUTER APPLY #WorkTable
WHERE COALESCE(Freight, OrderID) IS NOT NULL
AND @Average IS NOT NULL

PRINT 'Clear out work table'

TRUNCATE TABLE #WorkTable

PRINT 'Calculate AVG'

SELECT @Average = Avg(Freight)
FROM #WorkTable /*Scan count 1, logical reads 0*/
PRINT 'Return Rows - With the average column included'

SELECT *
FROM (SELECT @Average AS Avg_Freight) T
OUTER APPLY #WorkTable
WHERE COALESCE(Freight, OrderID) IS NOT NULL
AND @Average IS NOT NULL

DROP TABLE #WorkTable

最佳答案

工作表的逻辑读取计数不同:每读取有一次“逻辑读取”。这并不意味着工作表在某种程度上比“真正的”假脱机表效率低(恰恰相反);逻辑读取只是在不同的单元中。

我认为,对工作表逻辑读取的哈希页进行计数并不是很有用,因为这些结构是服务器内部的。报告逻辑读取计数器中假脱机的行使该数字对于分析目的更有意义。

这种见解应该可以让你的公式发挥作用的原因变得清晰。两个辅助线轴被完全读取两次 (2 * COUNT(*)),并且主线轴发出 (组值数 + 1) 行,如我的博客文章中所述,给出 (COUNT(DISTINCT CustomerID) + 1) 组件。加一是主线轴发出的额外行,表示最后一组已结束。

保罗

关于sql - 为什么窗口聚合函数的逻辑读取如此之高?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4230838/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com