gpt4 book ai didi

java - 如何使用 Lucene IndexReader 查找术语?

转载 作者:行者123 更新时间:2023-12-02 12:47:27 24 4
gpt4 key购买 nike

我正在尝试使用部分匹配来获取多短语查询。根据 MultiPhraseQueryJavaDoc:

A generalized version of PhraseQuery, with the possibility of adding more than one term at the same position that are treated as a disjunction (OR). To use this class to search for the phrase "Microsoft app*" first create a Builder and use MultiPhraseQuery.Builder.add(Term) on the term "microsoft" (assuming lowercase analysis), then find all terms that have "app" as prefix using LeafReader.terms(String), seeking to "app" then iterating and collecting terms until there is no longer that prefix, and finally use MultiPhraseQuery.Builder.add(Term[]) to add them. MultiPhraseQuery.Builder.build() returns the fully constructed (and immutable) MultiPhraseQuery.

https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/MultiPhraseQuery.html

我正在努力解决它说的部分:

...find all terms that have "app" as prefix using LeafReader.terms(String), seeking to "app" then iterating and collecting terms until there is no longer that prefix...

如何在那里寻找条款? LeafReader.terms(String) 为您提供 Terms,它有一个 iterator 方法,为您提供 TermsEnum,您可以寻求与。我只是不确定如何使用它提取匹配的术语?

最佳答案

听起来您已经掌握了如何获取TermsEnum,因此从那里开始,只需使用seekCeil查找您想要匹配的前缀,然后迭代TermsEnum,直到找到一个与前缀不匹配。例如:

Terms terms = MultiFields.getTerms(indexReader, "text");
TermsEnum termsEnum = terms.iterator();
List<Term> matchingTerms = new ArrayList<Term>();
termsEnum.seekCeil(new BytesRef("app"));
while (termsEnum.term().utf8ToString().startsWith("app")) {
matchingTerms.add(new Term("text", termsEnum.term()));
termsEnum.next();
}
System.out.println(matchingTerms);

关于java - 如何使用 Lucene IndexReader 查找术语?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44703572/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com