gpt4 book ai didi

java - 如何删除java中的停用词?

转载 作者:塔克拉玛干 更新时间:2023-11-01 22:30:22 25 4
gpt4 key购买 nike

我想删除 java 中的停用词。

所以,我从文本文件中读取停用词。

并存储集合

Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader br = new BufferedReader(new FileReader("stopwords.txt"));
String words = null;
while( (words = br.readLine()) != null) {
stopWords.add(words.trim());
}
br.close();

然后,我阅读了另一个文本文件。

所以,我想删除文本文件中的重复字符串。

我该怎么办?

最佳答案

使用 set 作为停用词:

Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader SW= new BufferedReader(new FileReader("StopWord.txt"));
for(String line;(line = SW.readLine()) != null;)
stopWords.add(line.trim());
SW.close();

和 ArrayList 用于输入 txt_file

BufferedReader br = new BufferedReader(new FileReader(txt_file.txt));
//make your arraylist here

// function deletStopWord() for remove all stopword in your "stopword.txt"
public ArrayList<String> deletStopWord(Set stopWords,ArrayList arraylist){
System.out.println(stopWords.contains("?"));
ArrayList<String> NewList = new ArrayList<String>();
int i=3;
while(i < arraylist.size() ){
if(!stopWords.contains(arraylist.get(i))){
NewList.add((String) arraylist.get(i));
}
i++;
}
System.out.println(NewList);
return NewList;
}

arraylist=deletStopWord(stopWords,arraylist);

关于java - 如何删除java中的停用词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12469332/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com