gpt4 book ai didi

SOLR 停用词 : words with 'of' give no results, 但是当 of 被排除时我们得到正确的结果

转载 作者:行者123 更新时间:2023-12-05 01:11:39 25 4
gpt4 key购买 nike

谁能解释一下 SOLR 中停用词的工作原理。在我的 stopword.txt 中,我定义了 of。在 schema.xml 我有

<filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"enablePositionIncrements="true"/>

现在,当我搜索任何包含单词 of 的内容时,结果不会显示。

示例: oil of olay 显示没有结果,而 oil olay 显示正确的结果。

更多文件定义:

        <analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="1"
splitOnCaseChange="0"
splitOnNumerics="0"
types="wdtypes.txt"
/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.TrimFilterFactory" updateOffsets="false"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>

<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="1"
splitOnCaseChange="0"
splitOnNumerics="0"
types="wdtypes.txt"
/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>

调试时:+(upclist:cream+of+wheat&qt=productresults&rows=10&fq=status%3AActive&fq=facilitystatus%3AActive&fq=facilityid%3A100&fq=inventoryctrlcode%3A%5B0+TO+100%5D&fq=weblifecycle%3A%283+OR+4%29&fq= groupnumber%3A2^1.2 | 关键字:cream+of+wheat&qt=productresults&rows=10&fq=status%3aactive&fq=facilitystatus%3aactive&fq=facilityid%3a100&fq=inventoryctrlcode%3a%5b0+to+100%5d&fq=weblifecycle%3a%283+or+ 4%29&fq=groupnumber%3a2^20.0 | product_elevate:cream+of+wheat&qt=productresults&rows=10&fq=status%3aactive&fq=facilitystatus%3aactive&fq=facilityid%3a100&fq=inventoryctrlcode%3a%5b0+to+100%5d&fq=weblifecycle%3a% 283+or+4%29&fq=groupnumber%3a2^5.0 | area:"(cream+of+wheat&qt=productresults&rows=10&fq=status%3aactive&fq=facilitystatus%3aactive&fq=facilityid%3a100&fq=inventoryctrlcode%3a%5b0+to+100% 5d&fq=weblifecycle%3a%283+or+4%29&fq=groupnumber%3a2 cream) of wheat qt productresult (row creamofwheatqtproductresultsrow) 10 fq status%3aactive fq facilitystatus%3aactive fq facilityid%3a100 fq inventoryctrlcod e%3a%5b0(到 fqstatus%3aactivefqfacilitystatus%3aactivefqfacilityid%3a100fqinventoryctrlcode%3a%5b0to)100%5d fq weblifecycle%3a%283(或 fqweblifecycle%3a%283or)4%29 fq(groupnumber%3a2 fqgroupnumberfstatstatusqus3acproductafactives %3aactivefqfacilityid%3a100fqinventoryctrlcode%3a%5b0to100%5dfqweblifecycle%3a%283or4%29fqgroupnumber%3a2)"~3^2.5 | productid:cream+of+wheat&qt=productresults&rows=10&fq=status%3AActive&fq=facilitystatus%3AActive&fq=facilityid%3A100&fq=inventoryctrlcode%3A%5B0+TO+100%5D&fq=weblifecycle%3A%283+OR+4%29&fq=groupnumber% 3A2^1.7 | productname:cream+of+wheat&qt=productresults&rows=10&fq=status%3aactive&fq=facilitystatus%3aactive&fq=facilityid%3a100&fq=inventoryctrlcode%3a%5b0+to+100%5d&fq=weblifecycle%3a%283+or+4%29&fq=groupnumber% 3a2^10.0)~0.01()

最佳答案

这可能不相关,因为您说您只在一个字段上进行搜索(无论如何我都会发布它,因为您说您正在使用 edismax 和 qf)。当我想加强精确搜索时,我遇到了类似的问题,所以我将 qf 设置成这样:<str name="qf">title^45 title_str^55 . title 字段使用了停用词,而 title_str 显然不是。描述了它经常找不到使用停用词的搜索的原因 here .他们的解决方案是使用 mm 值。在我的案例中有效的解决方案是将 title_str 放在 pf 标签中(并将其从 qf 标签中删除),这样准确的查找就会出现在顶部。

关于SOLR 停用词 : words with 'of' give no results, 但是当 of 被排除时我们得到正确的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33175013/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com