gpt4 book ai didi

java - 如何将文本文件的第一句话作为点之前的字符串删除?

转载 作者:行者123 更新时间:2023-12-01 10:33:32 27 4
gpt4 key购买 nike

我需要获取第一句中没有发布时间的新闻内容。

我的文本文件中有什么:

Updated January 21, 2016 09:31:19. While there is an argument to be made about modern batting wickets and boring matches, sometimes they give us spectacles like this.. First, Australia surged gloriously to 6 for 348. Second, for the bulk of India's reply, the touring side looked like it would reel that total in.. Finally, Australia crashed back into the game in a late flurry of wickets to win.. . Three centuries, 13 sixes, some hectic overs. It is true that the modern limited-overs game often reduces bowlers to bowling machines, and it was no less true in this contest.. But occasionally the quality of sublime batsmanship makes you willing to accept that inequity is not always iniquity..

我期望的结果是:

While there is an argument to be made about modern batting wickets and boring matches, sometimes they give us spectacles like this.. First, Australia surged gloriously to 6 for 348. Second, for the bulk of India's reply, the touring side looked like it would reel that total in.. Finally, Australia crashed back into the game in a late flurry of wickets to win.. . Three centuries, 13 sixes, some hectic overs. It is true that the modern limited-overs game often reduces bowlers to bowling machines, and it was no less true in this contest.. But occasionally the quality of sublime batsmanship makes you willing to accept that inequity is not always iniquity..

我当前的代码用于选取新闻网址的内容,内容就是上面的文字。

Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select("p");

for (Element p : paragraphs) {
String content = p.text() + (". ");
System.out.print(content);
PrintWriter out = new PrintWriter(new FileWriter("D:\\content.txt", true));
out.println(content);
out.close();

在将其写入文件之前,我应该将修复“内容”所需的代码放在哪里?

最佳答案

正如评论中所建议的,您需要在 for 循环中放置一个过滤器(我假设您想要从段落中的每个元素中删除第一行)。您可以实现一个以 p.text() 作为参数的新函数并从中删除第一个句子,或者(绝对更容易)您可以使用 java 预构建的子字符串方法;您找到该点的第一次出现,然后从中提取子串。简而言之,您应该使用 indexOf 找到第一个出现的位置,然后您可以从中提取子字符串

String tmp = p.text();
String content = tmp.substring(tmp.indexOf('.')+1) + (". ");

关于java - 如何将文本文件的第一句话作为点之前的字符串删除?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34953625/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com