gpt4 book ai didi

python - 使用一个分隔符但多个条件分割字符串

转载 作者:行者123 更新时间:2023-12-01 04:23:45 25 4
gpt4 key购买 nike

早上好,

我发现多个线程处理使用多个分隔符分割字符串,但不处理一个分隔符和多个条件

我想按句子分割以下字符串:

desc = Dr. Anna Pytlik is an expert in conservative and aesthetic dentistry. She speaks both English and Polish.

如果我这样做:

[t.split('. ') for t in desc]

我得到:

['Dr', 'Anna Pytlik is an expert in conservative and aesthetic dentistry', 'She speaks both English and Polish.']

我不想拆分“Dr”之后的第一个点。如何添加子字符串列表,在这种情况下 .split('. ') 不应适用?

谢谢!

最佳答案

您可以使用re.splitnegative lookbehind :

>>> desc = "Dr. Anna Pytlik is an expert in conservative and aesthetic dentistry. She speaks both English and Polish."
>>> re.split(r"(?<!Dr|Mr)\. ", desc)
['Dr. Anna Pytlik is an expert in conservative and aesthetic dentistry',
'She speaks both English and Polish.']

只需添加更多“异常(exception)”,用 | 分隔.

<小时/>

更新:似乎负向后查找要求所有替代方案具有相同的长度,因此这不适用于“博士”。和“教授”。一种解决方法可能是用 . 填充模式。 ,例如(?<!..Dr|..Mr|Prof) 。您可以轻松编写一个辅助方法,用尽可能多的 . 填充每个标题。如所须。但是,如果文本的第一个单词是 Dr.,这可能会中断,因为 .. 将不匹配。

另一种解决方法可能是首先用一些占位符替换所有标题,例如"Dr." -> "{DR}""Prof." -> "{PROF}" ,然后拆分,然后将原始标题交换回来。这样您甚至不需要正则表达式。

pairs = (("Dr.", "{DR}"), ("Prof.", "{PROF}")) # and some more
def subst_titles(s, reverse=False):
for x, y in pairs:
s = s.replace(*(x, y) if not reverse else (y, x))
return s

示例:

>>> text = "Dr. Anna Pytlik is an expert in conservative and aesthetic dentistry. Prof. Miller speaks both English and Polish."
>>> [subst_titles(s, True) for s in subst_titles(text).split(". ")]
['Dr. Anna Pytlik is an expert in conservative and aesthetic dentistry', 'Prof. Miller speaks both English and Polish.']

关于python - 使用一个分隔符但多个条件分割字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33387900/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com