gpt4 book ai didi

java - 为什么我只得到一个 TF-IDF 结果?

转载 作者:行者123 更新时间:2023-12-01 15:54:51 24 4
gpt4 key购买 nike

// Calculating term frequency
System.out.println("Please enter the required word :");
Scanner scan = new Scanner(System.in);
String word = scan.nextLine();

String[] array = word.split(" ");
int filename = 11;
String[] fileName = new String[filename];
int a = 0;
int totalCount = 0;
int wordCount = 0;


for (a = 0; a < filename; a++) {

try {
System.out.println("The word inputted is " + word);
File file = new File(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a
+ ".txt");
System.out.println(" _________________");

System.out.print("| File = abc" + a + ".txt | \t\t \n");

for (int i = 0; i < array.length; i++) {

totalCount = 0;
wordCount = 0;

Scanner s = new Scanner(file);
{
while (s.hasNext()) {
totalCount++;
if (s.next().equals(array[i]))
wordCount++;

}

System.out.print(array[i] + " ---> Word count = "
+ "\t\t " + "|" + wordCount + "|");
System.out.print(" Total count = " + "\t\t " + "|"
+ totalCount + "|");
System.out.printf(" Term Frequency = | %8.4f |",
(double) wordCount / totalCount);

System.out.println("\t ");

}
}
} catch (FileNotFoundException e) {
System.out.println("File is not found");

}

}

System.out.println("Please enter the required word :");
Scanner scan2 = new Scanner(System.in);
String word2 = scan2.nextLine();
String[] array2 = word2.split(" ");
int numofDoc;

for (int b = 0; b < array2.length; b++) {

numofDoc = 0;

for (int i = 0; i < filename; i++) {

try {

BufferedReader in = new BufferedReader(new FileReader(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc"
+ i + ".txt"));

int matchedWord = 0;

Scanner s2 = new Scanner(in);

{

while (s2.hasNext()) {
if (s2.next().equals(array2[b]))
matchedWord++;
}

}
if (matchedWord > 0)
numofDoc++;

} catch (IOException e) {
System.out.println("File not found.");
}

}
System.out.println(array2[b]
+ " --> This number of files that contain the term "
+ numofDoc);
double inverseTF = Math.log10((float) numDoc / numofDoc);
System.out.println(array2[b] + " --> IDF " + inverseTF );
double TFIDF = (((double) wordCount / totalCount) * inverseTF );
System.out.println(array2[b] + " --> TFIDF " + TFIDF);
}
}

嗨,这是我计算术语频率和 TF-IDF 的代码。第一个代码计算给定字符串的每个文件的术语频率。第二个代码应该使用上面的值计算每个文件的 TF-IDF。但我只收到一个值。它应该为每个文档提供 TF-IDF 值。

术语频率的输出示例:

输入的单词是“is”

<小时/>

|文件 = abc0.txt |
是 ---> 字数 = |2|总计数 = |150|词频 = | 0.0133 |

输入的单词是“is”

<小时/>

|文件 = abc1.txt |
是 ---> 字数 = |0|总计数 = |9|词频 = | 0.0000 | 0.0000

TF-IDF

是 --> 包含术语 7 的文件数量

是 --> IDF 0.1962946357308887

是 --> TFIDF 0.0028607962606519654 <<< 我想每个文件得到一个值,意味着我有 10 个文件,它应该为每个不同的文件提供 10 个不同的值。但是,它只打印一个结果。有人可以指出我的错误吗?

最佳答案

您希望每个文件重复的 println 语句是

double TFIDF = (((double) wordCount / totalCount) * inverseTF );
System.out.println(array2[b] + " --> TFIDF " + TFIDF);

但它包含在单个循环中

for (int b = 0; b < array2.length; b++)
仅。如果要为每个文件打印此行,则必须在所有文件上用另一个循环包围此语句。

由于这是家庭作业,我不会包含最终的代码,但给你另一个提示:你还在 TFIDF 的计算中包含了变量 wordCount 和totalCount。但这些对于每个文件名/单词对来说都是唯一的。因此,您不仅需要保存一次,还需要保存每个文件名/单词,或者在最终循环中再次重新计算它们。

关于java - 为什么我只得到一个 TF-IDF 结果?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5281476/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com