gpt4 book ai didi

Solr 搜索某些字符失败

转载 作者:行者123 更新时间:2023-12-04 01:11:17 25 4
gpt4 key购买 nike

我有一个 Solr 集合,它不返回几个非 ASCII 字符的结果。我们使用的示例是字符串 S11. • “≡ «Ñaïvétý» ‘¢¥£’ ¶!#% ;即使我在索引字段中有一个对象,搜索整个字符串也不会返回任何结果。但是,搜索该字符串的子字符串确实会返回匹配项。导致 Solr 不返回匹配项的唯一字符是中间的三个字符:• “≡ .该字段被索引为 text_en但我也试过 edge_ngram (希望有一点 Cargo Cult 魔法来解决这个问题)。这三个字符有什么特别之处,还是我需要调整 Solr 索引字段的方式?

我们正在通过 django-haystack 进行搜索,但问题也出现在 Solr 管理中。

以下是两个字段类型定义:

<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.EdgeNGramFilterFactory"
minGramSize="2" maxGramSize="50" side="front" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
</analyzer>
</fieldType>

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

最佳答案

您是否尝试过使用 ASCIIFoldingFilterFactory

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.


<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false"/>

关于Solr 搜索某些字符失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32383818/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com