gpt4 book ai didi

java - Apache Lucene TokenStream 契约(Contract)违规

转载 作者:搜寻专家 更新时间:2023-10-31 19:38:41 24 4
gpt4 key购买 nike

使用 Appache Lucene TokenStream 去除停用词导致错误:

TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

我使用这段代码:

public static String removeStopWords(String string) throws IOException {
TokenStream tokenStream = new StandardTokenizer(Version.LUCENE_47, new StringReader(string));
TokenFilter tokenFilter = new StandardFilter(Version.LUCENE_47, tokenStream);
TokenStream stopFilter = new StopFilter(Version.LUCENE_47, tokenFilter, StandardAnalyzer.STOP_WORDS_SET);
StringBuilder stringBuilder = new StringBuilder();

CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);

while(stopFilter.incrementToken()) {
if(stringBuilder.length() > 0 ) {
stringBuilder.append(" ");
}

stringBuilder.append(token.toString());
}

stopFilter.end();
stopFilter.close();

return stringBuilder.toString();
}

但如您所见,我从不调用 reset() 或 close()。

那么为什么我会收到此错误?

最佳答案

i never call reset() or close().

好吧,那你的问题。如果您愿意阅读 TokenStream javadoc,您会发现以下内容:

The workflow of the new TokenStream API is as follows:

  1. Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
  2. The consumer calls TokenStream#reset()
  3. ...

我只需要在你的代码中添加一行 reset() 就可以了。

...    
CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);
tokenStream.reset(); // I added this
while(stopFilter.incrementToken()) {
...

关于java - Apache Lucene TokenStream 契约(Contract)违规,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23931699/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com