gpt4 book ai didi

python正则表达式查找以数字为中心的子字符串

转载 作者:太空宇宙 更新时间:2023-11-04 01:16:00 27 4
gpt4 key购买 nike

我有一个字符串。我想将字符串切割成子字符串,其中包含一个包含数字的单词,两侧被(最多)4 个单词包围。如果子字符串重叠,它们应该合并。

Sampletext = "by the way I know 54 how to take praise for 65 excellent questions 34 thank you for asking appreciated."
re.findall('(\s[*\s]){1,4}\d(\s[*\s]){1,4}', Sampletext)
desired output = ['the way I know 54 how to take praise', 'to take praise for 65 excellent questions 34 thank you for asking']

最佳答案

重叠匹配:使用先行

这样做就可以了:

subject = "by the way I know 54 how to take praise for 65 excellent questions 34 thank you for asking appreciated."
for match in re.finditer(r"(?=((?:\b\w+\b ){4}\d+(?: \b\w+\b){4}))", subject):
print(match.group(1))

什么是词?

输出取决于您对单词的定义。在这里,一句话,我允许数字。这会产生以下输出。

输出(单词中允许数字)

the way I know 54 how to take praise
to take praise for 65 excellent questions 34 thank
for 65 excellent questions 34 thank you for asking

选项 2:单词中没有数字

subject = "by the way I know 54 how to take praise for 65 excellent questions 34 thank you for asking appreciated."    
for match in re.finditer(r"(?=((?:\b[a-z]+\b ){4}\d+(?: \b[a-z]+\b){4}))", subject, re.IGNORECASE):
print(match.group(1))

输出 2

the way I know 54 how to take praise

选项 3:扩展到四个不间断的非数字词

根据您的评论,此选项将扩展到枢轴的左侧和右侧,直到匹配四个不间断的非数字单词。逗号被忽略。

subject = "by the way I know 54 how to take praise for 65 excellent questions 34 thank you for asking appreciated. One Two Three Four 55 Extend 66 a b c d AA BB CC DD 71 HH DD, JJ FF"
for match in re.finditer(r"(?=((?:\b[a-z]+[ ,]+){4}(?:\d+ (?:[a-z]+ ){1,3}?)*?\d+.*?(?:[ ,]+[a-z]+){4}))", subject, re.IGNORECASE):
print(match.group(1))

输出 3

the way I know 54 how to take praise
to take praise for 65 excellent questions 34 thank you for asking
One Two Three Four 55 Extend 66 a b c d
AA BB CC DD 71 HH DD, JJ FF

关于python正则表达式查找以数字为中心的子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24691695/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com