gpt4 book ai didi

正则表达式以点|分号空间分割但忽略 url 例如

转载 作者:数据小太阳 更新时间:2023-10-29 03:20:19 25 4
gpt4 key购买 nike

我正在尝试解析和匹配大量法律文本,将其全部拆分成单独的句子。我有以下正则表达式,它只适用于几行简单的文本:

[^\.\!\?\;\n]*[\.\!\?\;\n](\s+)

!和 ?或在这里非常无关紧要但是。和 ;因为分隔符在我尝试处理的文本中很常见。问题是上面的正则表达式只是找到那些后跟空格字符的定界符。例如,以下文本将无法正确匹配:

Member State law or pursuant to contract with a health professional and subject to the conditions and safeguards referred to in paragraph 3; processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards comparison tool at https://ec.europa.eu/ploteus/en/compare Adopted 7 comparable procedures (e. g. certifications/audits), and registered as required by the Member State. of quality and safety of health care and of medicinal products or medical devices, on the basis of Union or Member State law, which provides for suitable and specific measures to safeguard the rights and freedoms of the data subject, in particular professional secrecy; processing is...

以下整个部分:

processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards comparison tool at https://ec.europa.

根本不会匹配。

如果您能帮助改进上述正则表达式,我们将不胜感激!

谢谢

最佳答案

我想你想要的名字是一个句子分词器。对于 Go,我可以推荐一个库:github.com/jdkato/prose ,它应该像魅力一样完成工作。

就我个人而言,我从未使用过。祝你好运!

关于正则表达式以点|分号空间分割但忽略 url 例如,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55049688/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com