gpt4 book ai didi

Solr-不区分大小写的搜索不起作用

转载 作者:行者123 更新时间:2023-12-04 18:11:55 27 4
gpt4 key购买 nike

我想在solr中对字段myfield应用不区分大小写的搜索。

我为此搜索了一下,发现我需要将LowerCaseFilterFactory应用于字段类型,并且字段应为solr.TextFeild

我在schema.xml中应用了它并重新索引了数据,然后我的搜索似乎区分大小写。

以下是我执行的搜索。

http://localhost:8080/solr/select?q=myfield:"cloud university"&hl=on&hl.snippets=99&hl.fl=myfield

以下是字段类型的定义
 <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

以下是我的字段定义
 <field name="myfield" type="text_en_splitting" indexed="true" stored="true" />

不确定,这有什么问题。
请帮助我解决此问题。

谢谢

编辑

调试查询
<lst name="debug">
<str name="rawquerystring">
"cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db
</str>
<str name="querystring">
"cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db
</str>
<str name="parsedquery">
+PhraseQuery(CC:"cloud univers") +guid:268406b6-db65-49da-848a-c59248f170db
</str>
<str name="parsedquery_toString">
+CC:"cloud univers" +guid:268406b6-db65-49da-848a-c59248f170db
</str>
<lst name="explain">
<str name="KSYS_20120805_1100">
12.572915 = (MATCH) sum of: 0.03595598 = weight(CC:"cloud univers" in 1560524), product of: 0.51819557 = queryWeight(CC:"cloud univers"), product of: 8.881522 = idf(CC: cloud=4798 univers=625207) 0.05834536 = queryNorm 0.06938689 = fieldWeight(CC:"cloud univers" in 1560524), product of: 1.0 = tf(phraseFreq=1.0) 8.881522 = idf(CC: cloud=4798 univers=625207) 0.0078125 = fieldNorm(field=CC, doc=1560524) 12.536959 = (MATCH) weight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 0.85526216 = queryWeight(guid:268406b6-db65-49da-848a-c59248f170db), product of: 14.658615 = idf(docFreq=1, maxDocs=1709587) 0.05834536 = queryNorm 14.658615 = (MATCH) fieldWeight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 1.0 = tf(termFreq(guid:268406b6-db65-49da-848a-c59248f170db)=1) 14.658615 = idf(docFreq=1, maxDocs=1709587) 1.0 = fieldNorm(field=guid, doc=1560524)
</str>
</lst>
<str name="QParser">LuceneQParser</str>
<lst name="timing">
<double name="time">60.0</double>
<lst name="prepare">
<double name="time">1.0</double>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
<lst name="process">
<double name="time">59.0</double>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">57.0</double>
</lst>
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">2.0</double>
</lst>
</lst>
</lst>
</lst>

最佳答案

您应该将solr.LowerCaseFilterFactory放在单词定界符之前,因为大写字母位于小写的中间,反之亦然会触发单词定界符

关于Solr-不区分大小写的搜索不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12071164/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com