gpt4 book ai didi

java - 删除JAVA中文件传递的停用词

转载 作者:行者123 更新时间:2023-11-30 05:34:55 25 4
gpt4 key购买 nike

我必须从 txt 文件中获取一些停用词并将其从文本中删除。我使用此方法从文件中获取停用词,将它们保存在字符串数组中并返回:

public String[] loadStopwords(File targetFile, String[] stopWords) throws IOException {

File fileTo = new File(targetFile.toString());
BufferedReader br;
List<String> lines = new ArrayList<String>();

try {
br = new BufferedReader(new FileReader(fileTo));
String st;
while((st=br.readLine()) != null){
lines.add(st);
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

stopWords = lines.toArray(new String[]{});
return stopWords;

}

然后,我传递 StopWords[] 和要在其中更新的文本:

public void removeStopWords(String targetText, String[] stopwords) {
targetText = targetText.toLowerCase().trim();

ArrayList<String> wordList = new ArrayList<>();
wordList.addAll(Arrays.asList(targetText.split(" ")));

List<String> stopWordsList = new ArrayList<>();
stopWordsList.addAll(Arrays.asList(stopwords));

wordList.removeAll(stopWordsList);

}

但是 wordList 中没有删除任何内容。为什么?

最佳答案

尝试将停用词也保存为小写:

public  String[] loadStopwords(String targetFile) throws IOException {
File fileTo = new File(targetFile);
BufferedReader br;
List<String> lines = new ArrayList<>();
try {
br = new BufferedReader(new FileReader(fileTo));
String st;
while((st=br.readLine()) != null){
//Adding words en lowercase and without start end blanks
lines.add(st.toLowerCase().trim);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}

return lines.toArray(new String[]{});
}

public ArrayList<String> removeStopWords(String targetText, String[] stopwords) {
//Make the text to LowerCase also
targetText = targetText.toLowerCase().trim();

ArrayList<String> wordList = new ArrayList<>();
wordList.addAll(Arrays.asList(targetText.split(" ")));

List<String> stopWordsList = new ArrayList<>();
stopWordsList.addAll(Arrays.asList(stopwords));

wordList.removeAll(stopWordsList);

return wordList;
}

关于java - 删除JAVA中文件传递的停用词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56851214/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com