gpt4 book ai didi

JAVA按单词、标点符号和引号拆分句子

转载 作者:行者123 更新时间:2023-11-30 07:05:09 28 4
gpt4 key购买 nike

我正在尝试使用正则表达式拆分句子。

句子:

"Hallo, I'm a dog. The end. Someone said: \"Earth is Earth\". Is it good? I like it! 'He is right' said I."

当前正则表达式:

\\s+|(?<=[\\p{Punct}&&[^']])|(?=[\\p{Punct}&&[^']])

当前结果:

{"Hallo", ",", "I'm", "a", "dog", ".", "The", "end", ".", "Someone",
"said", ":", **""**, """ , "Earth", "is", "Earth", """, ".", "Is", "it",
"good", "?", "I", "like", "it", "!", **"'He"**, "is", **"right'"**,
"said", "I", "."}

我在第一个引号之前有额外的 "",它不会将 ' 从单词中拆分出来。

我想要的结果:

{"Hallo", ",", "I'm", "a", "dog", ".", "The", "end", ".", "Someone",
"said", ":", """ , "Earth", "is", "Earth", """, ".", "Is", "it",
"good", "?", "I", "like", "it", "!", "'" , "He", "is", "right", "'",
"said", "I", "."}

编辑:对不起!更多代码:

String toTest =  "Hallo, I'm a dog. The end. Someone said: \"Earth is Earth\". Is it good? I like it! 'He is right' said I.";
String [] words = toTest.split("\\s+|(?<=[\\p{Punct}&&[^']])|(?=[\\p{Punct}&&[^']])");

并生成单词列表:

words = {"Hallo", ",", "I'm", "a", "dog", ".", "The", "end", ".", "Someone", "说", ":", "", """, "地球", "是", "地球", """, ".", "是", "它", “好”、“?”、“我”、“喜欢”、“它”、“!”、“他”、"is"、“对”, “说”,“我”,“。”

最佳答案

你可以试试:

\\s+|(?<=[\\p{Punct}&&[^']])(?!\\s)|(?=[\\p{Punct}&&[^']])(?<!\\s)|(?<=[\\s\\p{Punct}]['])(?!\\s)|(?=['][\\s\\p{Punct}])(?<!\\s)

said:\"Earth 的问题是你在空格之前和之后进行拆分,所以我在围绕标点符号拆分的部分中添加了一个负向前瞻和一个负向后视.

如果单引号前面或后面有空格或一些标点符号,我还添加了两种分隔单引号的情况。

但是,正如@RealSkeptic 在他的评论中所写,这不会处理

a single quote that denotes possesion like dolphins' noses

您可能需要为此编写一个真正的解析器。

关于JAVA按单词、标点符号和引号拆分句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27062022/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com