gpt4 book ai didi

java - 如何在java中使用lucene添加自定义停用词

转载 作者:搜寻专家 更新时间:2023-11-01 02:47:49 24 4
gpt4 key购买 nike

我正在使用 lucene 删除英文停用词,但我的要求是删除英文停用词和自定义停用词。下面是我使用 lucene 删除英文停用词的代码。

我的示例代码:

public class Stopwords_remove {
public String removeStopWords(String string) throws IOException
{
StandardAnalyzer ana = new StandardAnalyzer(Version.LUCENE_30);
TokenStream tokenStream = new StandardTokenizer(Version.LUCENE_36,newStringReader(string));
StringBuilder sb = new StringBuilder();
tokenStream = new StopFilter(Version.LUCENE_36, tokenStream, ana.STOP_WORDS_SET);
CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);
while (tokenStream.incrementToken())
{
if (sb.length() > 0)
{
sb.append(" ");
}
sb.append(token.toString());
}
return sb.toString();
}

public static void main(String args[]) throws IOException
{
String text = "this is a java project written by james.";
Stopwords_remove stopwords = new Stopwords_remove();
stopwords.removeStopWords(text);

}
}

输出:java project written james.

要求的输出:java project james.

我该怎么做?

最佳答案

您可以将额外的停用词添加到标准英语停用词集的副本中,或者只添加另一个 StopFilter。喜欢:

TokenStream tokenStream = new StandardTokenizer(Version.LUCENE_36, new StringReader(string));
CharArraySet stopSet = CharArraySet.copy(Version.LUCENE_36, StandardAnalyzer.STOP_WORD_SET);
stopSet.add("add");
stopSet.add("your");
stopSet.add("stop");
stopSet.add("words");
tokenStream = new StopFilter(Version.LUCENE_36, tokenStream, stopSet);
//Or, if you just need the added stopwords in a standardanalyzer, you could just pass this stopfilter into the StandardAnalyzer...
//analyzer = new StandardAnalyzer(Version.LUCENE_36, stopSet);

或:

TokenStream tokenStream = new StandardTokenizer(Version.LUCENE_36, new StringReader(string));
tokenStream = new StopFilter(Version.LUCENE_36, tokenStream, StandardAnalyzer.STOP_WORDS_SET);
List<String> stopWords = //your list of stop words.....
tokenStream = new StopFilter(Version.LUCENE_36, tokenStream, StopFilter.makeStopSet(Version.LUCENE_36, stopWords));

如果您正在尝试创建自己的分析器,您最好遵循更像 Analyzer documentation 中的示例的模式。 .

关于java - 如何在java中使用lucene添加自定义停用词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18008999/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com