gpt4 book ai didi

Python 的正则表达式模块 : repeating 'backreferences' does not appear to work correctly

转载 作者:太空宇宙 更新时间:2023-11-03 11:48:27 27 4
gpt4 key购买 nike

注意:我正在使用 PyPi 替代正则表达式模块

我有一个 python 程序,我在其中寻找以逗号分隔的特定格式的重复标签。

格式为:(*words...* #*number*)

例如:Trial #1、Trial #2、Run #3Spring trial #13 都符合格式。

我在原始字符串中使用:([\w ]*#\d\d?,)\1* 作为我的正则表达式模式。

在 java 和各种正则表达式测试引擎中,在字符串上使用带有此模式的 findall():

Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3, (...

...) Run #20,Run #20,Run #20,Run #20,Run #20,Run #20,Run #20

返回:

match 1: Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,

match 2: Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,

...etc.

但在 python 中,它返回:

match 1: Run #1,

match 2: Run #2,

...etc.

我希望它返回第一个结果(由 java 和其他程序的正则表达式返回的结果)

关于 python 的正则表达式引擎,有什么我忽略的地方吗?为什么我会得到这个结果?

我的代码是:

import regex

file = open('Pendulum Data.csv',mode='r')
header1 = file.readline()
header2 = file.readline()

pattern1 = regex.compile(r'([\w ]*#\d\d?)\1*',flags=regex.V0)
header1Match = pattern1.findall(header1)
for x in header1Match:
print(x)

for循环和print语句是为了查看结果。

(这让我想到了另一个问题:regex.findall() 究竟返回了什么?findall() 是否在我打印结果时返回了我想要的内容错了吗?)

...是的,我正在为我的模式使用原始字符串。

最佳答案

您正在正则表达式中使用捕获组。如果在模式中指定了捕获组,则 Python .finall 返回捕获文本的元组。因此,您正在寻找一个 .finditer 函数。

参见 Python re.finditer documentation :

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

re.findall :

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

这是一个small demo使用 re.finditer:

import re
p = re.compile(r'([\w ]*#\d\d?,)\1*')
test_str = "Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3, (..."
print [x.group() for x in p.finditer(test_str)]

结果:

['Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,', 'Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,', 'Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,']

Casimir 是对的,对于如此简单的正则表达式,您可以使用正则 re 模块。

关于Python 的正则表达式模块 : repeating 'backreferences' does not appear to work correctly,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33702003/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com