gpt4 book ai didi

python - 使用正则表达式查找所有单词小于或等于 2 的句子

转载 作者:太空宇宙 更新时间:2023-11-04 06:42:23 24 4
gpt4 key购买 nike

我尝试了很多正则表达式来查找所有只包含等于或小于两个单词的单词的句子,单词应该是这样的:你好! or This or (MY NAME) or (!see) or any combination of all english characters+symbols like ?:!#,@ or numbers:

我试过了:

(\n|\r)\s*\w+[^\w]*\w*[^\w]*\w*[^\w]*(\n|$)+

\n\s*\w+ 

而且 ^(\S+\s?) 也不起作用。

还有很多但我无法得到正确的结果 http://prntscr.com/84db2a

最佳答案

如果您使用 this version of regex模块,那么下面的代码将起作用。

它具有 overlapped=True 功能,这对于下面的正则表达式代码来说是必不可少的。它还匹配第一句话(如果它只有两个词)。再一次,您必须使用上面链接的 regex 库 - 它具有与内置 re 模块提供的几乎相同的功能。

import regex


data = ("This sentence has a few words. This too. Hello world. This has four "
"words. This doesn't. This one has five words.")
found = regex.findall(r"^([^\s]+\s*[^\s]+)\s*\.|\.\s*([^\s]+\s+[^\s]+)\s*\.",
data, overlapped=True)

for group in found:
for sentence in filter(None, group):
print(sentence)

上面的代码也可以在 Python 的内置 re 模块中运行,但是如果两个相邻的句子恰好由两个单词组成,则只会匹配其中一个。


这是来自 regex101.com 的代码分解:

1st Alternative: ^([^\s]+\s*[^\s]+)\s*\.
^ assert position at start of the string
1st Capturing group ([^\s]+\s*[^\s]+)
[^\s]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
[^\s]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\. matches the character . literally
2nd Alternative: \.\s*([^\s]+\s+[^\s]+)\s*\.
\. matches the character . literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
2nd Capturing group ([^\s]+\s+[^\s]+)
[^\s]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
[^\s]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\. matches the character . literally

关于python - 使用正则表达式查找所有单词小于或等于 2 的句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31999464/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com