gpt4 book ai didi

java - 使用 BreakIterator Java 将带引号的文本拆分为句子

转载 作者:行者123 更新时间:2023-12-01 11:54:21 25 4
gpt4 key购买 nike

我尝试使用 BreakIterator Java 将包含引用的段落拆分为句子。

这是我的段落,其中包含我想要拆分的引文:

"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold," he said. About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.


这是我的代码:

public class SplitParagraph {
public static void main(String[] args){
String paragraph = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.ENGLISH);
iterator.setText(paragraph);
int start = iterator.first();
int i=1;
for (int end = iterator.next();end != BreakIterator.DONE; start = end, end = iterator.next()) {
System.out.println("Sentence "+i+" : "+paragraph.substring(start,end));
i++;
}
}}


输出程序:

句子 1:“人们现在变得越来越聪明,也越来越挑剔。
第2句:他们知道哪些人有资格选择,哪一锅,哪里有金子,”他说。第3句:关于应对即将到来的选举的策略,Edi说,仍在等待规定。

The output program is incorrect because the paragraph only contain 2 sentences. Not 3 sentences.


正确的输出程序必须是这样的:

句子 1:“人们现在变得越来越聪明,越来越挑剔。他们知道哪些有资格选择,哪一盘,哪里有金子,”他说。
第2句:关于应对即将到来的选举的策略,埃迪说,仍在等待规定。

对我的问题有什么想法吗?

最佳答案

只需根据下面的正则表达式分割您的输入,

"(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)"

这匹配双引号内不存在的点后面存在的一个或多个空格。

(?<=\\.) - 正向回顾,只关注所有点。

\\s+ - 匹配一个或多个空格字符。

(?=...) - 正向前瞻,断言匹配后面必须是,

(?:\"[^\"]*\"|[^\"])* - 任何双引号 block ,如 "foobar"或任何字符但不包含双引号,零次或多次。

(?:\"[^\"]*\"|[^\"])*$然后它必须到达行尾。这与 "foo. bar" 中的空格不匹配字符串,因为空格后面存在一个单双引号,而不是双引号 block 。

DEMO

String s = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
String parts[] = s.split("(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)");
for(String i: parts)
{
System.out.println(i);
}

输出:

"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold," he said.
About strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.

String s = "\"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold,\" he said. About Mr. Mrs. strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.";
String parts[] = s.split("(?<!Mrs?\\.)(?<=\\.)\\s+(?=(?:\"[^\"]*\"|[^\"])*$)");
for(String i: parts)
{
System.out.println(i);
}

输出:

"People are now getting smarter and more critical. They know which are eligible to choose, which one pan, where the gold," he said.
About Mr. Mrs. strategies for coping with the upcoming elections, Edi said, it was still awaiting the provision.

关于java - 使用 BreakIterator Java 将带引号的文本拆分为句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28554260/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com