gpt4 book ai didi

python - 匹配字符串中的所有标识符

转载 作者:行者123 更新时间:2023-12-01 03:00:34 25 4
gpt4 key购买 nike

问题:

我正在寻找一种方法来匹配给定行中的某些标识符 以某些词开头。身份证件包括 字符,可能后跟数字,后跟破折号,然后是一些 更多数字。一个 ID 应该只在匹配的行上匹配 起始词是以下之一:Closes、Fixes、Resolves。如果一个 一行包含多个 ID,这些 ID 将被分隔 字符串。任意数量的 ID 可以出现在一个 行。

示例测试字符串:

Closes PD-1                                           # Match: PD-1

Related to PD-2 # No match, line doesn't start with an allowed word

Closes
NPD-1 # No match, as the identifier is in a new line

Fixes PD-21 and PD-22 # Match: PD-21, PD-22

Closes PD-31, also PD-32 and PD-33 # Match: PD-31 - the rest is not captured because of ", also"
Resolves PD4-41 and PD4-42 and PD4-43 and PD4-44 # Match: PD4-41, PD4-42, PD4-43, PD4-44

Resolves something related to N-2 # No match, the identifier is not directly after 'Resolves'

我尝试了什么:

使用正则表达式来获取所有匹配项,我总是在某些方面做不到。例如。我试过的正则表达式之一是:

^(?:Closes|Fixes|Resolves) (\w+-\d+)(?:(?: and )(\w+-\d+))*

  1. 我打算在线路需要的地方有一个非捕获组以允许的单词之一开头,后跟一个空格:^(?:关闭|修复|解决)
  2. 然后至少一个 ID 需要跟在起始词之后,我打算捕获:(\w+-\d+)
  3. 最后,零个或多个 ID 可以跟在第一个之后,它们是由字符串 分隔,但我只想捕获这里是 ID,不是分隔符:(?:(?: and )(\w+-\d+))*

这个正则表达式在 python 中的结果:

test_string = """
Closes PD-1 # Match: PD-1
Related to PD-2 # No match, line doesn't start with an allowed word
Closes
NPD-1 # No match, as the identifier is in a new line
Fixes PD-21 and PD-22 # Match: PD-21, PD-22
Closes PD-31, also PD-32 and PD-33 # Match: PD-31 - the rest is not captured because of ", also"
Resolves PD4-41 and PD4-42 and PD4-43 and PD4-44 # Match: PD4-41, PD4-42, PD4-43, PD4-44
Resolves something related to N-2 # No match, the identifier is not directly after 'Resolves'
"""

ids = []

for match in re.findall("^(?:Closes|Fixes|Resolves) (\w+-\d+)(?:(?: and )(\w+-\d+))*", test_string, re.M):
for group in match:
if group:
ids.append(group)

print(ids)
['PD-1', 'PD-21', 'PD-22', 'PD-31', 'PD4-41', 'PD4-44']

Also, here is the result on regex101.com .如果第一个 ID 之后有多个 ID,不幸的是它只会捕获最后一个匹配项,而不是所有匹配项。我读到重复捕获组只会捕获最后一次迭代,我应该在重复组周围放置一个捕获组以捕获所有迭代,但我无法使其工作。

总结:

是否有正则表达式的解决方案,类似于我尝试过的,但它捕获所有出现的 ID?或者是否有更好的方法来使用 Python 解析此字符串的 ID?

最佳答案

您可以使用单个捕获组,在该捕获组中匹配第一次出现并重复相同的模式 0+ 次,前面是空格,然后是 and 和空格。

值在组 1 中。

要获取单独的值,请拆分

^(?:Closes|Fixes|Resolves) (\w+-\d+(?: and \w+-\d+)*)

Regex demo

关于python - 匹配字符串中的所有标识符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58880341/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com