gpt4 book ai didi

python - 正则表达式 : Punctuation and greediness

转载 作者:行者123 更新时间:2023-11-28 22:59:03 26 4
gpt4 key购买 nike

假设这是我们的文本:

text = 'After 1992 , the winter and summer Olympics will be held two years apart , with the revised schedule beginning with the winter games in 1994 and the summer games in 1996 . ) Now , Mr. Pilson -- a former college basketball player who says a good negotiator needs `` a level of focus and intellectual attention  similar to a good athlete-s is facing the consequences of his own aggressiveness . Next month , talks will begin on two coveted CBS contracts'
print re.search(r'(\w+ |\W+ ){0,4}1992( \W+| \w+){4}', text).group(0)

Output: After 1992 , the winter and

但是这个给了我:

print re.search(r'(\w+ |\W+ ){0,4}1992( \W+| \w+){0,4}', text).group(0)

Output: After 1992 ,

我觉得很奇怪,为什么第二个正则表达式不贪心?

这个比其他的有点奇怪:

print re.search(r'(\w+ |\W+ ){0,4}summer( \W+| \w+){0,4}', text).group(0)

Output , the winter and summer Olympics will be held

问题

1-第一个和第二个有什么区别。对我来说,它应该给出相同的文本,因为唯一的区别是 {0,4} 并且如果 {4} 给出长字符串,{0,4} 应该给出相同的字符串,因为正则表达式是贪婪的。

2- 问题可能与标点符号有关,因为第三个示例在 {0,4}{4} 上工作相同..

我很困惑。

最佳答案

这里没有什么神秘的。

在您的第二个示例中,␣\W+ 超过了 ␣,␣(空白 也是 \W< 的一部分 类),因此未找到 ␣\w+ 与剩余的 the␣winter␣... 的后续匹配项——但是 {0, 4} 约束得到满足,所以不需要那些进一步的匹配。到目前为止一切顺利。

回到你的第一个例子,上面的匹配不满足 {4},所以引擎继续寻找。在 ␣\W+ 匹配中它回溯了最后一个空白 所以 ␣\W+ 只匹配了 ␣,, 然后␣\w+ 的 3 个后续匹配可以针对 ␣the␣winter␣... -- 并且 {4} 得到满足。

将您的正则表达式更改为 ([^ ]+ +){0,4}my_word( +[^ ]+){0,4}(这保持了原始表达式的精神,将空格视为分隔符,将其他所有内容(包括标点符号)视为单词)或者,也许更好,(\w+\W+){0,4}my_word(\W+\w+){0,4}无论标点符号如何,在两边最多隔离 4 个实际单词。

稍后,

Hi vladr. Regular expression that you provided is not working with this text (target word is part in this text):

The city 's Department of Consumer Affairs charged Newmark & Lewis Inc. with failing to deliver on its promise of lowering prices . In a civil suit commenced in state Supreme Court in New York , the agency alleged that the consumer-electronics and appliance discount-retailing chain engaged in deceptive advertising by claiming to have '' lowered every price on every item '' as part of an advertising campaign that began June 1 . The agency said it monitored Newmark & Lewis 's advertised prices before and after the ad campaign , and found that the prices of at least 50 different items either increased or stayed the same . In late May , Newmark & Lewis announced a plan to cut prices 5 % to 20 % and eliminate what it called a '' standard discount-retailing practice '' of negotiating individual deals with customers ."

啊哈。它在 Department 中匹配 part

  • 如果您只想匹配整个单词,请使用 (^|(\w+\W+){1,5})\W*my_word\W*((\W+\w+){1,5} |$),这应该隔离分隔符和/或行尾之间的单词。
  • 如果您想匹配 Department 中的部分,请使用 (\w+\W+){0,5}\w*my_word\w*(\W*\w+){0,5}

关于python - 正则表达式 : Punctuation and greediness,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13349946/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com