gpt4 book ai didi

java - 如何修复文本文件中的标点符号?

转载 作者:行者123 更新时间:2023-12-01 10:39:51 29 4
gpt4 key购买 nike

我目前正在开发一个独立项目,但在将文本文件转换为正确的格式时遇到问题。目前,我的程序读取一个新行——它假设一行=一个句子——但这是有问题的,因为有人可以插入一个标点符号分散在各处的段落。我想做的就是使每个句子成为其单独的行,然后从该文件中读取。我不想空着,所以我尝试了唯一的方法,我让它可以处理短长度的字符串,但是一旦我进入更长的文本文件,我不得不使用 Streams,我遇到了问题:(文件名字太长)

<小时/>示例:

输入:这是一个虚拟句子。你好,这也是之一。还有这个。

输出:

这是一个虚拟句子。

您好,这也是一个。

还有这个。

<小时/>这是工作

public static void main(String args[])
{
String text = "Joanne had one requirement: Her child must be" +
" adopted by college graduates. So the doctor arranged" +
"for the baby to be placed with a lawyer and his wife." +
" Paul and Clara named their new baby Steven Paul Jobs.";
Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
Matcher matcher = pattern.matcher(text);
StringBuilder text_fixed = new StringBuilder();
String withline = "";
int starter = 0;
String overall = "";
String blankspace = " ";

while (matcher.find())
{
int holder = matcher.start();
System.out.println("=========> " + holder);

/***/

withline = text.substring(starter, holder + 1);
withline = withline + "\r\n";
overall = overall + withline;
System.out.println(withline);
starter = holder + 2;


}
System.out.println(overall);
//return overall;
}

<小时/>这会出现问题:

                public static void main(String[] args) throws IOException
{
final String INPUT_FILE = "practice.txt";
InputStream in = new FileInputStream(INPUT_FILE);
String fixread = getStringFromInputStream(in);
String fixedspace = fixme(fixread);
File ins = new File(fixedspace);
BufferedReader reader = new BufferedReader(new FileReader(ins));
Pattern p = Pattern.compile("\n");
String line, sentence;
String[] t;
while ((line = reader.readLine()) != null )
{
t = p.split(line); /**hold curr sentence and remove it from OG txt file since you will reread.*/
sentence = t[0];
indiv_sentences.add(sentence);
}
//putSentencestoTrie(indiv_sentences);
//runAutocompletealt();
}



private static String fixme(String fixread)
{
Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
String actString = fixread.toString();
Matcher matcher = pattern.matcher(actString);
String withline = "";
int starter = 0;
String overall = "";
while (matcher.find())
{
int holder = matcher.start();
withline = actString.substring(starter, holder + 1);
withline = withline + "\r\n";
overall = overall + withline;
starter = holder + 2;
}

return overall;
}

/**this is not my code, this was provided by an outside source, I do not take credit*/
/**http://www.mkyong.com/java/how-to-convert-inputstream-to-string-in-java/*/
private static String getStringFromInputStream(InputStream is) {

BufferedReader br = null;
StringBuilder sb = new StringBuilder();

String line;
try {

br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
sb.append(line);
}

} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

return sb.toString();

}



https://github.com/ChristianCSE/Phrase-Finder

我很确定这就是我在本节中使用的所有代码,但是如果您需要查看我的其余代码,我提供了指向我的存储库的链接。谢谢! enter image description here

最佳答案

问题是您正在创建一个名称应该是其内容的文件 - 这对于文件名来说太长了。

 String fixedspace =  fixme(fixread);
File ins = new File(fixedspace);//this is the issue, you gave the content as its name

尝试给出示例名称并将输出写入文件。下面是一个示例。

String fixedspace =  fixme(fixread);
File out= new File("output.txt");
FileWriter fr = new FileWriter(out);
fr.write(fixedspace);

然后阅读并继续。

关于java - 如何修复文本文件中的标点符号?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34487274/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com