gpt4 book ai didi

python - 正则表达式匹配所有带有引号的句子

转载 作者:行者123 更新时间:2023-12-01 05:32:04 26 4
gpt4 key购买 nike

我试图匹配所有包含引号的句子,无论引号的长度或引号内的句子数量如何。

正如 Alfe 指出的那样,获得完美的正则表达式可能不可行,但如果可能的话,我想改进我正在使用的正则表达式。

现在我正在这样做以查找引用:

def split_by_quotes(text):
pattern = r'([A-Z].*?\".*?\".*?\.)'
quotes = re.findall(pattern, text)
return(quotes)

但我想确保引用出现在句子中,然后捕获整个句子。

我所说的一句话是指一段文字:

  1. 前面通常有一个空格
  2. 以大写字符或引号开头
  3. 以 .或者 !或者 ?或(有时直接跟“或')
  4. 后面通常跟一个空格

正如 Alfe 指出的那样,这不会捕获所有句子,但如果我能够匹配这些条件就足够了。

例如:

"This is a quote, it should be matched"

This is text without a quote on a new line after multiple carriage returns, it should not be matched.

更复杂的示例:

Charles Babbage said: "On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question."

整个句子都会匹配。

但是,

They called Garfield Minus Garfield and Lolcats, but when Johnson saw what he considered to be a particularly hilarious clip of someone falling down and then being "played off stage" by a cat with a keyboard, his friends thought it was lame. "I said, this is going to be big.", he says, "My friends were like, 'Nah, it's just a cat.'"

将按如下方式匹配:

They called Garfield Minus Garfield and Lolcats, but when Johnson saw what he considered to be a particularly hilarious clip of someone falling down and then being "played off stage" by a cat with a keyboard, his friends thought it was lame.

"I said, this is going to be big.", he says, "My friends were like, 'Nah, it's just a cat.'"

最佳答案

这对你有帮助吗?

请注意,我编辑了我的第一个答案,该答案很愚蠢,因为其中的正则表达式匹配所有句子,而不是仅匹配带引号的句子。
我还考虑了阿尔夫的评论:正则表达式捕获的句子不仅仅以大写字母开头,而是以点后的第一个字符开头,除了空格或 \rn 或可能的其他点

import re
regx = re.compile('(?!\Z)'
'[. \n\r]*'
'('
'(?:[^."]*"[^"]*")+'
'[^."]*'
'(?:\.|\Z)'
')')

s = ('''\nThe "some.rutu" and "oula oulah, poto." are '''
'''all good. A "bi'didi." is not. I '''
"""don't know why... 5 "million" people """
"""died . \nAnd here's a sentence without """
"""a quote. "Halt!" he shouted. 'Sunny """
"""days and "nights"' is a strange phrase""")
print s
print
for el in regx.findall(s):
print '- %s' % el

结果

The "some.rutu"  and "oula oulah, poto."  are all good. A "bi'didi."  is not.  I don't know why... 5 "million" people died . 
And here's a sentence without a quote. "Halt!" he shouted. 'Sunny days and "nights"' is a strange phrase

- The "some.rutu" and "oula oulah, poto." are all good.
- A "bi'didi." is not.
- 5 "million" people died .
- "Halt!" he shouted.
- 'Sunny days and "nights"' is a strange phrase

关于python - 正则表达式匹配所有带有引号的句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19979272/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com