gpt4 book ai didi

java - 关于java数据结构的查询

转载 作者:行者123 更新时间:2023-12-01 15:01:42 25 4
gpt4 key购买 nike

我想计算多个文件的词频。

此外,我在这些文件中有这些词

a1.txt = {aaa, aaa, aaa} 
a2.txt = {aaa}
a3.txt = {aaa, bbb}

所以,结果一定是aaa = 3,bbb = 1。

然后,我定义了上述数据结构,

LinkedHashMap<String, Integer> wordCount = new LinkedHashMap<String, Integer>();
Map<String, LinkedHashMap<String, Integer>>
fileToWordCount = new HashMap<String,LinkedHashMap<String, Integer>>();

然后,我从文件中读取单词并将它们放入 wordCount 和 fileToWordCount 中:

/*lineWords[i] is a word from a line in the file*/
if(wordCount.containsKey(lineWords[i])){
System.out.println("1111111::"+lineWords[i]);
wordCount.put(lineWords[i], wordCount.
get(lineWords[i]).intValue()+1);
}else{
System.out.println("222222::"+lineWords[i]);
wordCount.put(lineWords[i], 1);
}
fileToWordCount.put(filename, wordCount); //here we map filename
and occurences of words

最后,我使用上面的代码打印 fileToWordCount,

Collection a;
Set filenameset;

filenameset = fileToWordCount.keySet();
a = fileToWordCount.values();
for(Object filenameFromMap: filenameset){
System.out.println("FILENAMEFROMAP::"+filenameFromMap);
System.out.println("VALUES::"+a);
}

和打印,

FILENAMEFROMAP::a3.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]
FILENAMEFROMAP::a1.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]
FILENAMEFROMAP::a2.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]

那么,我如何使用映射 fileToWordCount 来查找文件中的词频?

最佳答案

你让事情变得比必要的更加困难。我会这样做:

Map<String, Counter> wordCounts = new HashMap<String, Counter>();
for (File file : files) {
Set<String> wordsInFile = new HashSet<String>(); // to avoid counting the same word in the same file twice
for (String word : readWordsFromFile(file)) {
if (!wordsInFile.contains(word)) {
wordsInFile.add(word);
Counter counter = wordCounts.get(word);
if (counter == null) {
counter = new Counter();
wordCounts.put(word, counter);
}
counter.increment();
}
}
}

关于java - 关于java数据结构的查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13549689/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com