gpt4 book ai didi

java - java中如何将段落分割成句子

转载 作者:行者123 更新时间:2023-12-02 03:33:46 24 4
gpt4 key购买 nike

我正在尝试将段落拆分为句子。该段落可以有一个像 F.C.B 这样的单词,它还包括一些 html 标签,比如 anchor 和其他标签。我试图使用如下所示的方法,但通过按原样使用 html 标签,将我的段落与特定句子分开并不完美。

String.split("(?<!\\.[a-zA-Z])\\.(?![a-zA-Z]\\.)(?![<[^>]*>])");  

请问有谁可以帮助我提供更好的正则表达式或任何想法吗?

最佳答案

你可以试试这个:

String par = "In 2004, Obama received national attention during his campaign to represent Illinois in the United States Senate with his victory in the March Democratic Party primary, his keynote address at the Democratic National Convention in July, and his election to the Senate in November. He began his presidential campaign in 2007 and, after a close primary campaign against Hillary Clinton in 2008, he won sufficient delegates in the Democratic Party primaries to receive the presidential nomination.";
Pattern pattern = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher matcher = pattern.matcher(par);
while (matcher.find()) {
System.out.println(matcher.group());
}

让我知道它是否有效

关于java - java中如何将段落分割成句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37713323/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com