gpt4 book ai didi

提取赫斯特模式的正则表达式模式

转载 作者:行者123 更新时间:2023-12-05 05:55:03 25 4
gpt4 key购买 nike

我是正则表达式的新手,我无法以列表或元组的形式提取下位词-上位词对。我尝试使用这种模式,但没有找到匹配项

(NP_[\w.]*(, NP_[\w.]*)*,? (and)? other NP_[\w.]*)

对于“和其他”模式,我有以下带注释的句子:

  1. NP_kimmel 面临NP_dui、NP_fleeing 或 NP_evading_police 以及其他 NP_possible_charges
  2. NP_network 已要求 NP_big_bang_theory_co-creator_bill prady 策划 NP_revival ,这将看到 NP_kermit NP_frog 、 NP_miss_piggy 、 NP_fozzie_bear 和其他 NP_old_favorites 的 NP_return

我想提取一个列表,例如:

[NP_dui,NP_fleeing or NP_evading_police, NP_possible_charges]

(NP_dui,NP_possible_charges)
(NP_fleeing or NP_evading_police,NP_possible_charges)

与句子 2 类似:

[NP_kermit the NP_frog , NP_miss_piggy , NP_fozzie_bear, NP_old_favorites]

或类似的元组。

如有任何帮助,我们将不胜感激。

最佳答案

使用

NP_[\w.]*(?:\s*(?:,|\bor\b|,?\s*and(?:\s+other)?\b)\s*NP_[\w.]*)+

这会提取匹配的字符串。接下来,使用 NP_[\w.]* 提取预期的条目。

Python code :

import re

test_strs = ["NP_kimmel faces NP_dui , NP_fleeing or NP_evading_police , and other NP_possible_charges.",
"The NP_network has asked NP_big_bang_theory_co-creator_bill prady to mastermind the NP_revival , which would see the NP_return of NP_kermit the NP_frog , NP_miss_piggy , NP_fozzie_bear and other NP_old_favorites ."]
p = r'NP_[\w.]*(?:\s*(?:,|\bor\b|,?\s*and(?:\s+other)?\b)\s*NP_[\w.]*)+'

for test_str in test_strs:
matches = []
for match in re.findall(p, test_str):
matches.extend(re.findall(r'NP_[\w.]*\b', match))
print(matches)

结果:['NP_dui', 'NP_fleeing', 'NP_evading_police', 'NP_possible_charges']
['NP_frog', 'NP_miss_piggy', 'NP_fozzie_bear', 'NP_old_favorites']

解释

--------------------------------------------------------------------------------
NP_ 'NP_'
--------------------------------------------------------------------------------
[\w.]* any character of: word characters (a-z, A-
Z, 0-9, _), '.' (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
or 'or'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
,? ',' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
and 'and'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
other 'other'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
NP_ 'NP_'
--------------------------------------------------------------------------------
[\w.]* any character of: word characters (a-z,
A-Z, 0-9, _), '.' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)+ end of grouping

关于提取赫斯特模式的正则表达式模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69543576/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com