gpt4 book ai didi

SQL Server Full Text Search Condition for FORMSOF for phrase 排除停用词

转载 作者:行者123 更新时间:2023-12-04 14:39:39 26 4
gpt4 key购买 nike

我想搜索一些带有停用词的短语,例如“Line Through Crack”。 “通过”是停止词。我想得到与查询相同的结果

CONTAINS(*, 'FORMSOF(INFLECTIONAL, "Line") AND FORMSOF(INFLECTIONAL, "Crack")')

因此所有行包含除停用词之外的所有词的所有形式。如果客户不知道停用词列表,我可以这样做吗?

最佳答案

您使用的是什么版本的 SQL Server?如果是 2008 年或更高版本,则您可以在查询运行时以编程方式检索停用词列表。然后,您可以检查是否有任何搜索词在停用词列表中,并将它们从“CONTAINS”查询字符串中排除。

以下查询将返回停用词列表(对于美国英语,语言 ID 为 1033):

-- Run the following to get a list of languages and their IDs
select lcid, name from sys.syslanguages order by 1

-- Then use that ID to get a list of stop words
select * from sys.fulltext_stopwords where language_id = 1033

根据这些信息,您可以编写一个搜索过程来执行类似的操作(这是一个非常基本的示例,但您应该明白了):

USE [AdventureWorks]
GO
-- Make sure you have a full-text catalogue to test against
/*
IF EXISTS(SELECT * FROM sys.fulltext_indexes WHERE [object_id] = OBJECT_ID('Production.ProductDescription'))
DROP FULLTEXT INDEX ON Production.ProductDescription;
IF EXISTS(SELECT * FROM sys.fulltext_catalogs WHERE name = 'FTC_product_description')
DROP FULLTEXT CATALOG FTC_product_description;
CREATE FULLTEXT CATALOG [FTC_product_description]
WITH ACCENT_SENSITIVITY = OFF
AS DEFAULT AUTHORIZATION [dbo]
CREATE FULLTEXT INDEX ON [Production].[ProductDescription]([Description] LANGUAGE [English])
KEY INDEX [PK_ProductDescription_ProductDescriptionID] ON ([FTC_product_description], FILEGROUP [PRIMARY])
WITH (CHANGE_TRACKING = AUTO, STOPLIST = SYSTEM);
*/
GO
IF OBJECT_ID('dbo.my_search_proc') IS NULL EXEC ('CREATE PROC dbo.my_search_proc AS ');
GO
-- My Search Proc
ALTER PROC dbo.my_search_proc (
@query_string NVARCHAR(1000),
@language_id INT = 1033 -- change this to whatever your default language ID is
) AS
BEGIN
SET NOCOUNT ON;

------------------------------------------------------
-- Split the string into 1 row per word
------------------------------------------------------
-- I've done this in-line here for simplicity, but I
-- would recommend creating a CLR function instead
-- for performance reasons.
DECLARE @words TABLE (id INT IDENTITY(1,1), word NVARCHAR(100));
DECLARE @cnt INT, @split_on CHAR(1)
SELECT @cnt = 1, @split_on = ' ';
WHILE (CHARINDEX(@split_on, @query_string) > 0)
BEGIN
INSERT INTO @words (word)
SELECT word = LEFT(LTRIM(RTRIM(SUBSTRING(@query_string,1,CHARINDEX(@split_on,@query_string)-1))), 100);
SET @query_string = SUBSTRING(@query_string,CHARINDEX(@split_on,@query_string)+1,LEN(@query_string));
SET @cnt = @cnt + 1;
END
INSERT INTO @words (word)
SELECT word = LEFT(LTRIM(RTRIM(@query_string)), 100);

------------------------------------------------------
-- Now build your "FORMSOF" string, excluding stop words.
------------------------------------------------------
DECLARE @formsof NVARCHAR(4000);

SELECT @formsof = ISNULL(@formsof, '')
+ 'FORMSOF(INFLECTIONAL, "' + w.word + '") AND '
FROM @words AS w
LEFT JOIN sys.fulltext_system_stopwords AS sw -- use sys.fulltext_stopwords instead if you're using a user-defined stop-word list (or use both)
ON w.word = sw.stopword COLLATE database_default
AND sw.language_id = @language_id
WHERE sw.stopword IS NULL
ORDER BY w.id; -- retain original order in case you do any weighting based on position, etc.

-- If nothing was returned, then the whole query string was made up of stop-words,
-- so just return an empty result set to the application.
IF @@ROWCOUNT = 0
SELECT TOP(0) * FROM Production.ProductDescription;

SET @formsof = LEFT(@formsof, LEN(@formsof)-4); -- Remove the last "AND"
PRINT 'Query String: ' + @formsof

------------------------------------------------------
-- Now perform the actual Full-Text search
------------------------------------------------------
SELECT *
FROM Production.ProductDescription
WHERE CONTAINS(*, @formsof);
END
GO

EXEC dbo.my_search_proc 'bars for downhill';

因此,如果您搜索“bars for downhill”,那么“for”将被删除(因为它是一个停用词),您应该得到 FORMSOF(INFLECTIONAL, "bars") AND FORMSOF(INFLECTIONAL,“下坡”)。

不幸的是,如果您使用的是 SQL 2005 并且不知道干扰词文件中有什么,那么您无能为力(据我所知)。

干杯,戴夫

关于SQL Server Full Text Search Condition for FORMSOF for phrase 排除停用词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8984639/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com