gpt4 book ai didi

java - 搜索和统计文本文件中的单词片段

转载 作者:太空宇宙 更新时间:2023-11-04 06:38:54 24 4
gpt4 key购买 nike

我的任务是编写一段代码来打开一个文本文件,然后搜索文本文件中出现的用户字符串并报告出现的次数。

下面是我所拥有的代码。它将搜索单词片段,这很好,但教授希望它搜索包含空格和所有内容的奇怪片段。类似“of my”或“even g”或任何其他任意字符串。

我的工作代码如下,我一直在尝试使compareTo工作,但我似乎无法理解语法。这位教授坚持不提供帮助,而且这是暑期类,所以助教不提供帮助。我用谷歌搜索没有结果,看来我无法将问题放入一组合适的单词中进行搜索。

import java.io.File;
import java.io.FileNotFoundException;
import java.util.*;

import javax.swing.*;

public class TextSearchFromFile
{
public static void main(String[] args) throws FileNotFoundException
{

boolean run = true;
int count = 0;


//greet user
JOptionPane.showMessageDialog(null,
"Hello, today you will be searching through a text file on the harddrive. \n"
+ "The Text File is a 300 page fantasy manuscript written by: Adam\n"
+ "This exercise was intended to have the user enter the file, but since \n"
+ "you, the user, don't know which file the text to search is that is a \n"
+ "bit difficult.\n\n"
+ "On the next window you will be prompted to enter a string of characters.\n"
+ "Feel free to enter that string and see if it is somewhere in 300 pages\n"
+ "and 102,133 words. Have fun.",
"Text Search",
JOptionPane.PLAIN_MESSAGE);

while (run)
{
try
{
//open the file
Scanner scanner = new Scanner(new File("An Everthrone Tale 1.txt"));

//prompt user for word
CharSequence findWord = JOptionPane.showInputDialog(null,
"Enter the word to search for:",
"Text Search",
JOptionPane.PLAIN_MESSAGE);
count = 0;


while (scanner.hasNext())
{

if ((scanner.next()).contains(findWord))
{
count++;
}

} //end search loop


//output results to user
JOptionPane.showMessageDialog(null,
"The results of your search are as follows: \n"
+ "Your String: " + findWord + "\n"
+ "Was found: " + count + " times.\n"
+ "Within the file: An Ever Throne Tale 1.txt",
"Text Search",
JOptionPane.PLAIN_MESSAGE);
} //end try
catch (NullPointerException e)
{
JOptionPane.showMessageDialog(null,
"Thank you for using the Text Search.",
"Text Search",
JOptionPane.ERROR_MESSAGE);
System.exit(0);
}
} //end run loop
} // end main
} // end class

只是不知道如何让它搜索这样疯狂的任意片段。他知道文本文件中的内容,因此他知道他可以将序列放在一起,就像我上面的示例一样,这些序列可以在文本中找到,但事实并非如此。

最佳答案

不要使用 hasNext()next() 因为它们一次只会从输入文件返回一个标记,并且您将无法找到多词短语(或任何包含空格的内容)。如果您使用 hasNextLine()nextLine() ,您可以做得更好一点,但它仍然找不到“of my”出现的情况,其中“of”作为一行的最后一个单词,“my”作为下一行的第一个单词。要找到这一点,您需要更多背景信息。

如果您跟踪从文件中读取的最后一行,则可以创建一个两行缓冲区并查找分布在多行中的实例。

String last = ""; // initially, last is empty

while (scanner.hasNextLine())
{

String line = scanner.nextLine();
String text = last + " " + line; // two-line buffer

if (text.contains(findWord))
{
count++;
}

last = line; // remember the last line read

} //end search loop

这应该可以找到跨两行的单词,但仍然存在三个问题。首先,您可以有一个类似“三行长”的短语,它分为三行:

  three  lines  long

You would need to extend the two-line buffer concept to find this. Ultimately, you might need to have the entire file in memory at once, but I suspect that is enough of an edge case that you probably don't care about it.

Second, when words are found on a single line, you will count them twice. Once when the word first appears on the line being read, and a second time when it is in the last line, the previous time it has been read.

Third, using contains in this way won't find multiple copies of the same word on the same line. So if you are looking for "dog" and the following text appears:

  My dog saw a dog today at the dog park which was full of dogs.

The test with contains will only cause count to be incremented once. (But it would happen again when this line was in last.)

So I think you really need to 1. Read the entire file into a buffer, to find phrases split across any number of lines, and 2. Search through the lines using indexOf with an offset that increases until no more matches are found.

String text = "";

if (scanner.hasNextLine())
{
text += scanner.nextLine(); // first line
}
while (scanner.hasNextLine())
{
text += " "; // separate lines with a space
text += scanner.nextLine();
}

int found, offset = 0; // start looking at the beginning, offset 0
while ((found = text.indexOf(findWord, offset)) != -1)
{
count++; // found a match
offset = found + 1; // look for next match after this match
}

如果您不关心跨多行的匹配,那么您可以一次执行一行,并避免一次将整个文本存储在内存中的内存成本。

关于java - 搜索和统计文本文件中的单词片段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24941755/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com