gpt4 book ai didi

python - 重新拆分并分隔结果

转载 作者:太空宇宙 更新时间:2023-11-03 15:20:11 27 4
gpt4 key购买 nike

如何将分隔符包含到 re.split 结果中?

比如我有文字

Bla bla lbaa dsad asd as. Asd qe as!  ASDadf asd! Dsss dwq. Dkmef? 

正则表达式

re.split('\s*([\.!\?]+)\s*', data)

并且re.split返回这个

['Bla bla lbaa dsad asd as', '.', 'Asd qe as', '!', 'ASDadf asd', '!', 'Dsss dwq', '.', 'Dkmef', '?', '']

虽然我想要这个

['Bla bla lbaa dsad asd as.', 'Asd qe as!', 'ASDadf asd!', 'Dsss dwq.']

我怎样才能做到没有尖刺?

谢谢

最佳答案

您可以尝试通过标点符号前面的空格进行拆分:

In [9]: re.split(r'(?<=[\.!\?])\s+', data)
Out[9]:
['Bla bla lbaa dsad asd as.',
'Asd qe as!',
' ASDadf asd!',
'Dsss dwq.',
'Dkmef?']

来自 documentation for the re module 的解释:

(?<=...)

Matches if the current position in the string is preceded by a match for ... that ends at the current position. This is called a positive lookbehind assertion. (?<=abc)def will find a match in abcdef, since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not.

关于python - 重新拆分并分隔结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16200961/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com