gpt4 book ai didi

sql-server - Sql Server 2005 全文搜索中的干扰词

转载 作者:行者123 更新时间:2023-12-03 03:11:59 25 4
gpt4 key购买 nike

我正在尝试对数据库中的一系列名称进行全文搜索。这是我第一次尝试使用全文搜索。目前,我输入输入的搜索字符串,并在每个术语之间放置一个 NEAR 条件(即输入的短语“Kings of Leon”变为“Kings NEAR of NEAR Leon”)。

不幸的是,我发现这种策略会导致假阴性搜索结果,因为 SQL Server 在创建索引时会删除单词“of”,因为它是干扰词。因此,“Kings Leon”将正确匹配,但“Kings of Leon”则不会。

我的同事建议将 MSSQL\FTData\noiseENG.txt 中定义的所有干扰词放入 .Net 代码中,以便在执行全文搜索之前将干扰词去除。

这是最好的解决方案吗?是否没有一些我可以在 SQL Server 中更改的自动魔法设置来为我执行此操作?或者也许只是一个更好的解决方案,不会让人感觉那么hacky?

最佳答案

全文将根据您提供的搜索条件进行工作。您可以从文件中删除干扰词,但这样做确实有使索引大小膨胀的风险。罗伯特·凯恩 (Robert Cain) 在他的博客上提供了很多与此相关的好信息:

http://arcanecode.com/2008/05/29/creating-and-customizing-noise-words-in-sql-server-2005-full-text-search/

为了节省一些时间,您可以查看此方法如何删除它们并复制代码和单词:

        public string PrepSearchString(string sOriginalQuery)
{
string strNoiseWords = @" 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | $ | ! | @ | # | $ | % | ^ | & | * | ( | ) | - | _ | + | = | [ | ] | { | } | about | after | all | also | an | and | another | any | are | as | at | be | because | been | before | being | between | both | but | by | came | can | come | could | did | do | does | each | else | for | from | get | got | has | had | he | have | her | here | him | himself | his | how | if | in | into | is | it | its | just | like | make | many | me | might | more | most | much | must | my | never | now | of | on | only | or | other | our | out | over | re | said | same | see | should | since | so | some | still | such | take | than | that | the | their | them | then | there | these | they | this | those | through | to | too | under | up | use | very | want | was | way | we | well | were | what | when | where | which | while | who | will | with | would | you | your | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z ";

string[] arrNoiseWord = strNoiseWords.Split("|".ToCharArray());

foreach (string noiseword in arrNoiseWord)
{
sOriginalQuery = sOriginalQuery.Replace(noiseword, " ");
}
sOriginalQuery = sOriginalQuery.Replace(" ", " ");
return sOriginalQuery.Trim();
}

但是,我可能会使用 Regex.Replace 来实现这一点,这应该比循环快得多。我只是没有一个简单的例子可以发布。

关于sql-server - Sql Server 2005 全文搜索中的干扰词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/938175/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com