gpt4 book ai didi

sorting - 如何使用自然排序顺序对 solr 中的文本/字符串进行排序?

转载 作者:行者123 更新时间:2023-12-01 06:35:48 25 4
gpt4 key购买 nike

我想按照以下方式对值列表进行排序:

  • 4
  • 5xa
  • 8kdjfew454
  • 9
  • 10
  • 999cc
  • b
  • c9
  • c10cc
  • c11


换句话说,有时被称为“自然排序”,其中文本在有文本的地方按字母/字典顺序排序,但在有数字的地方按数字排序,即使两者混合在同一个字符串中。

我无论如何都找不到在 Solr (4.0 atm) 中执行此操作的方法。有没有标准的方法来做到这一点,或者至少有一个可行的“食谱”?

最佳答案

您可以实现的最接近的事情在 this article 中有所描述。
来自文章:

To force numbers to sort numerically, we need to left-pad any numberswith zeroes: 2 becomes 0002, 10 becomes 0010, 100 becomes 0100, etcetera. Then even a lexical sort will arrange values like this:

Title No. 1 Title No. 2 Title No. 10 Title No. 100

The Field Type

This alphanumeric sort field type converts any numbers found to 6digits, padded with zeroes. (If you expect numbers larger than 6digits in your field values, you will need to increase the number ofzeroes when padding.)

The field type also removes English and French leading articles,lowercases, and purges any character that isn’t alphanumeric. It isEnglish-centric, and assumes that diacritics have been folded intoASCII characters.



<fieldType name="alphaNumericSort" class="solr.TextField" sortMissingLast="false" omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<!-- The LowerCase TokenFilter does what you expect, which can be
when you want your sorting to be case insensitive
-->
<filter class="solr.LowerCaseFilterFactory" />
<!-- The TrimFilter removes any leading or trailing whitespace -->
<filter class="solr.TrimFilterFactory" />
<!-- Remove leading articles -->
<filter class="solr.PatternReplaceFilterFactory"
pattern="^(a |the |les |la |le |l'|de la |du |des )" replacement="" replace="all"
/>
<!-- Left-pad numbers with zeroes -->
<filter class="solr.PatternReplaceFilterFactory"
pattern="(\d+)" replacement="00000$1" replace="all"
/>
<!-- Left-trim zeroes to produce 6 digit numbers -->
<filter class="solr.PatternReplaceFilterFactory"
pattern="0*([0-9]{6,})" replacement="$1" replace="all"
/>
<!-- Remove all but alphanumeric characters -->
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z0-9])" replacement="" replace="all"
/>
</analyzer>
</fieldType>

Sample output

Title No. 1 => titleno000001Title No. 2 => titleno000002
Title No. 10 => titleno000010
Title No. 100 => titleno000100

关于sorting - 如何使用自然排序顺序对 solr 中的文本/字符串进行排序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15164342/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com