gpt4 book ai didi

java - 如何统计文本中字符出现的次数?

转载 作者:行者123 更新时间:2023-12-02 09:03:54 26 4
gpt4 key购买 nike

首先,以下代码运行一个包含 37.000 个字符的 txt 文件(工作正常)。我想计算每个角色出现的可能性。因此,为了实现该目标,我必须计算每个字母在 test.txt 文件中出现的次数。

File file = new File("test.txt");
FileInputStream fileStream = new FileInputStream(file);
InputStreamReader input = new InputStreamReader(fileStream);
BufferedReader reader = new BufferedReader(input);

String line;

// Initializing counters
int countWord = 0;
int sentenceCount = 0;
int characterCount = 0;
int whitespaceCount = 0;
int a,b,c,d,e,f,g,h,i,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z=0;

// Reading line by line from the
// file until a null is returned
while((line = reader.readLine()) != null) {
if(!(line.equals(""))) {
characterCount += line.length();


// \\s+ is the space delimiter in java
String[] wordList = line.split("\\s+");

countWord += wordList.length;
whitespaceCount += countWord -1;

// [!?.:]+ is the sentence delimiter in java
String[] sentenceList = line.split("[!?.:]+");

sentenceCount += sentenceList.length;
}
}

System.out.println("Total number of characters = " + characterCount);
System.out.println("Total number of whitespaces = " + whitespaceCount);
}

我正在考虑以下代码,但我确信代码更短,效率更高。

while((line = reader.readLine()) != null)
if(!(line.equals(""))) {
characterCount += line.length();
if (line.equals("a")){
a++;
}...
//same for the rest letters.

最佳答案

创建按字符键控的 map 非常简单。

  1. Files.lines 接受输入文件并读取行。
  2. flatMap映射到字符流
  3. 然后,字符按字符/计数键/值对进行分组。
        Map<String, Long> freq = null;
try {
freq = Files.lines(Path.of("testfile.txt"))
.flatMap(line -> Arrays.stream(line.split("")))
.filter(str -> str.length() > 0)
.collect(Collectors.groupingBy(chr -> chr,
Collectors.counting()));
} catch (IOException ioe) {
ioe.printStackTrace();
}

本声明

        freq.forEach((ch,cnt)->
System.out.println("char = " + ch +"(" +
Integer.toHexString(ch.charAt(0)) + ")" + " count = " + cnt));

打印与此类似的内容,并提供十六进制值。

char =  (20) count = 10
char = a(61) count = 4
char = r(72) count = 1
char = s(73) count = 9
char = d(64) count = 2
char = t(74) count = 8
char = e(65) count = 3
char = h(68) count = 4
char = i(69) count = 6
char = .(2e) count = 2
char = n(6e) count = 3
char = o(6f) count = 2

关于java - 如何统计文本中字符出现的次数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59968406/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com