gpt4 book ai didi

SQL ROW_NUMBER() 性能问题

转载 作者:行者123 更新时间:2023-12-04 20:07:15 27 4
gpt4 key购买 nike

我有这个运行良好的 SQL。

希望我的过滤器返回具有最高 UserSessionSequenceID 的最新唯一 SessionGuid。

问题是性能很差——即使我有很好的索引。
我怎样才能重写这个 - 省略 ROW_NUMBER 行?

SELECT TOP(@resultCount) * FROM 
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
,[IP]
,[Url]
,[UrlTitle]
,[SiteID]
,[BrowserWidth]
,[BrowserHeight]
,[Browser]
,[BrowserVersion]
,[Referer]
,[Timestamp]
,ROW_NUMBER() over (PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC) AS sort
FROM [tblSequence]
) AS t
WHERE ([Timestamp] > DATEADD(mi, -@minutes, GETDATE()))
AND (SiteID = @siteID)
AND sort = 1
ORDER BY [UserSessionSequenceID] DESC

非常感谢 :-)

最佳答案

even though I have good indexes



无意冒犯,但让我们来判断一下。始终发布 精确 在询问 SQL Server 性能问题时,表的架构,包括所有索引和基数。

例如,让我们考虑以下表结构:
create table tblSequence (
[UserSessionSequenceID] int not null
,[SessionGuid] uniqueidentifier not null
,[SiteID] int not null
,[Timestamp] datetime not null
, filler varchar(512));
go

create clustered index cdxSequence on tblSequence (SiteID, [Timestamp]);
go

这与您的相同,但与性能问题无关的所有字段都聚合到通用填充符中。让我们看看,在大约 50k 个 session 的 1M 行上的性能有多糟糕?让我们用随机数据填充表格,但我们将模拟“用户事件”的数量:
set nocount on;
declare @i int = 0, @sc int = 1;
declare @SessionGuid uniqueidentifier = newid()
, @siteID int = 1
, @Timestamp datetime = dateadd(day, rand()*1000, '20070101')
, @UserSessionSequenceID int = 0;
begin tran;
while @i<1000000
begin
insert into tblSequence (
[UserSessionSequenceID]
,[SessionGuid]
,[SiteID]
,[Timestamp]
, filler)
values (
@UserSessionSequenceID
, @SessionGuid
, @siteID
, @timestamp
, replicate('X', rand()*512));

if rand()*100 < 5
begin
set @SessionGuid = newid();
set @siteID = rand() * 10;
set @Timestamp = dateadd(day, rand()*1000, '20070101');
set @UserSessionSequenceID = 0;
set @sc += 1;
end
else
begin
set @timestamp = dateadd(second, rand()*300, @timestamp);
set @UserSessionSequenceID += 1;
end

set @i += 1;
if (@i % 1000) = 0
begin
raiserror(N'Inserted %i rows, %i sessions', 0, 1, @i, @sc);
commit;
begin tran;
end
end
commit;

这大约需要 1 分钟才能填满。现在让我们查询您提出的相同查询:在过去 Y 分钟内,站点 X 上的任何用户 session 的最后一个操作是什么?我必须为 @now 使用特定日期而不是 GETDATE() 因为 emy dtaa 是模拟的,而不是真实的,所以我使用的是为 SiteId 1 随机填充的任何最大时间戳:
set statistics time on;
set statistics io on;

declare @resultCount int = 30;
declare @minutes int = 60*24;
declare @siteID int = 1;
declare @now datetime = '2009-09-26 02:08:27.000';

SELECT TOP(@resultCount) * FROM
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
, SiteID
, Filler
,[Timestamp]
,ROW_NUMBER() over (PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC) AS sort
FROM [tblSequence]
where SiteID = @siteID
and [Timestamp] > DATEADD(mi, -@minutes, @now)
) AS t
WHERE sort = 1
ORDER BY [UserSessionSequenceID] DESC ;

这是与您的查询相同的查询,但限制性过滤器移动到 ROW_NUMBER() 部分子查询中。结果回来了:
Table 'tblSequence'. Scan count 1, logical reads 12, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 31 ms.

热缓存的响应时间为 31 毫秒,从表的近 60k 页中读取了 12 页。

更新

再次阅读原始查询后,我意识到我修改后的查询是不同的。您只需要新的 session 。我仍然相信通过 SiteID 和 Timestmap 过滤掉是获得必要性能的唯一方法,因此解决方案是使用 NOT EXISTS 条件验证候选结果:
SELECT TOP(@resultCount) * FROM  
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
, SiteID
, Filler
,[Timestamp]
,ROW_NUMBER() over (
PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC)
AS sort
FROM [tblSequence]
where SiteID = @siteID
and [Timestamp] > DATEADD(mi, -@minutes, @now)
) AS new
WHERE sort = 1
and not exists (
select SessionGuid
from tblSequence
where SiteID = @siteID
and SessionGuid = new.SessionGuid
and [TimeStamp] < DATEADD(mi, -@minutes, @now)
)
ORDER BY [UserSessionSequenceID] DESC

这在我的笔记本电脑上返回,在 40 毫秒内从热缓存中返回超过 400k session 的 1M 行:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'tblSequence'. Scan count 2, logical reads 709, physical reads 0

SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 40 ms.

关于SQL ROW_NUMBER() 性能问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3485680/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com