gpt4 book ai didi

sql - 优化更新查询

转载 作者:行者123 更新时间:2023-12-04 14:17:59 24 4
gpt4 key购买 nike

我正在寻找优化此查询的建议,该查询已经运行了一个多小时,其中表中包含约300,000行。我们使用的报告工具要求数据在被拉出时必须处于这种形状,因此重新设计表结构不是一种选择。该表如下所示:

CREATE TABLE [datatable](
[RowID] [int] IDENTITY(1,1) NOT NULL,
[CampaignID] [int] NOT NULL,
[CampaignName] [nvarchar](255) NULL,
[Category] [nvarchar](255) NOT NULL,
[PostID] [int] NOT NULL,
[TopicName] [nvarchar](4000) NULL,
[TopicFrequency] [int] NULL
)


数据不断添加到表中,因此我必须定期更新主题频率。这是我当前的查询:

UPDATE  datatable
SET TopicFrequency = b.TopicFrequency
FROM datatable INNER JOIN
(SELECT CampaignID, Category, TopicName, COUNT(DISTINCT PostID) AS TopicFrequency
FROM datatable GROUP BY CampaignID, Category, TopicName) AS b
ON datatable.CampaignID = b.CampaignID
AND datatable.Category = b.Category
AND datatable.TopicName = b.TopicName


主题名称为nvarchar 4000,我无法在该字段上创建索引。寻找想法。谢谢。

最佳答案

一般决策-将表拆分为两个或多个表-即-标准化数据结构。我认为可以再引入两个表格-用于广告系列和主题



对于您当前的数据结构,您可以创建一个uniqueidentifierbigint计算列作为TopicName字段的哈希,对其进行索引并查找哈希而不是字符串字段。我将为您提供bigint的示例:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[HashString64SVF](@input NVARCHAR(4000))
RETURNS BIGINT
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT
AS
BEGIN
RETURN
CAST(SUBSTRING(HASHBYTES('SHA1', UPPER(@Input)), 1, 8) AS BIGINT)
^ CAST(SUBSTRING(HASHBYTES('SHA1', UPPER(@Input)), 9, 8) AS BIGINT)
^ CAST(SUBSTRING(HASHBYTES('SHA1', UPPER(@Input)), 17, 4) AS BIGINT)
END
GO
ALTER TABLE datatable ADD TopicNameHash AS dbo.HashString64SVF(TopicName)
GO
CREATE INDEX NewIndexName ON DataTable(TopicNameHash, CampaignID, Category) INCLUDE(PostId)
GO
UPDATE datatable
SET TopicFrequency = b.TopicFrequency
FROM datatable
JOIN
(SELECT CampaignID, Category, TopicNameHash, COUNT(DISTINCT PostID) AS TopicFrequency
FROM datatable GROUP BY CampaignID, Category, TopicNameHash) AS b
ON datatable.CampaignID = b.CampaignID
AND datatable.Category = b.Category
AND datatable.TopicNameHash = b.TopicNameHash




在RowId列上创建主键



像这样重新创建表:

CREATE TABLE [datatable](
[RowID] [int] IDENTITY(1,1) PRIMARY KEY,
[CampaignID] [int] NOT NULL,
[Category] [nvarchar](255) NOT NULL,
[PostID] [int] NOT NULL,
--uncomment if needed [TopicNameHash] AS dbo.HashString64SVF(TopicName),
[TopicFrequency] [int] NULL,
[CampaignName] [nvarchar](255) NULL,
[TopicName] [nvarchar](4000) NULL
)


主要原因-如果您的可为空的变量列位于列列表的末尾并且这些列中有许多 NULL值-sql server可以在行中节省一些空间,因此-在IO中

关于sql - 优化更新查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8736287/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com