gpt4 book ai didi

java - 如何将段落拆分成句子?

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:47:07 24 4
gpt4 key购买 nike

请看下面的内容。

String[]sentenceHolder = titleAndBodyContainer.split("\n|\\.(?!\\d)|(?<!\\d)\\.");

这就是我尝试将段落拆分成句子的方式。但有个问题。我的段落包括像 Jan 这样的日期。 13, 2014,单词如 U.S 和数字如 2.2。他们都被上面的代码分开了。所以基本上,无论是否是句号,这段代码都会拆分很多“点”。

我尝试了 String[]sentenceHolder = titleAndBodyContainer.split(".\n");String[]sentenceHolder = titleAndBodyContainer.split("\\.");以及。全部失败。

我怎样才能“恰本地”将一个段落拆分成句子?

最佳答案

你可以试试这个

String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2. They all got split by the above code.";

Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher reMatcher = re.matcher(str);
while (reMatcher.find()) {
System.out.println(reMatcher.group());
}

输出:

This is how I tried to split a paragraph into a sentence.
But, there is a problem.
My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2.
They all got split by the above code.

关于java - 如何将段落拆分成句子?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21430447/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com