gpt4 book ai didi

ElasticSearch:英语停用词列表

转载 作者:行者123 更新时间:2023-12-02 22:22:18 33 4
gpt4 key购买 nike

当我们选择 _english_ 作为 Stop Token Filter 中提到的语言时,我在哪里可以找到 ElasticSearch 6.3 中使用的最新停用词列表文档。

最佳答案

Elasticsearch 为此使用 Lucene 默认值。直到大约一周前,这曾经在 https://github.com/apache/lucene-solr/blob/branch_7x/lucene/core/src/java/org/apache/lucene/analysis/standard/StandardAnalyzer.java#L47-L53 中.

现已移至https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L44-L50 ,但列表是相同的:

final List<String> stopWords = Arrays.asList(
"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"
);

关于ElasticSearch:英语停用词列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51285677/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com