gpt4 book ai didi

python - 如何提取匹配字符串之间的文本,包括匹配字符串和行

转载 作者:行者123 更新时间:2023-11-30 22:39:32 24 4
gpt4 key购买 nike

我正在使用 python 来提取匹配字符串之间的某些字符串。这些字符串是从列表生成的,该列表又由单独的 python 函数动态生成。我正在处理的列表如下所示:-

sample_list = ['line1 this line a first line',
'line1 this line is also considered as line one...',
'line1 this line is the first line',
'line2 this line is second line to be included in output',
'line3 this should also be included in output',
'line1 this contain other strings',
'line1 this may contain other strings as well',
'line2 this line is second line to be included in output',
'line3 this should also be included in output',
'line1 what the heck is it...'
]

我想要的输出与此类似:-

line1 this line is the first line
line2 this line is second line to be included in output
line3 this should also be included in output
line1 this may contain other strings as well
line2 this line is second line to be included in output
line3 this should also be included in output

如您所见,我想提取以line1开头并以line3(直到行结尾)结尾的文本/行。最终输出包括两个匹配的单词(即 line1 和 line3)。

我尝试过的代码是:-

# Convert list to string first
list_to_str = '\n'.join(sample_list)
# Get desired output
print(re.findall('\nline1(.*?)\nline2(.*?)\nline3($)', list_to_str, re.DOTALL))

这就是我得到的输出 ():-

[]

感谢任何帮助。

编辑1:-我做了一些工作并找到了最接近的解决方案:-

matches = (re.findall(r"^line1(.*)\nline2(.*)\nline3(.*)$", list_to_str, re.MULTILINE))

for match in matches:
print('\n'.join(match))

它给了我这个输出:-

 this line is the first line
this line is second line to be included in output
this is the third and it should also be included in output
this may contain other strings as well
this line is second line to be included in output...
this is the third should also be included in output

输出几乎正确,但不包含匹配文本。

最佳答案

如果您正在寻找不重复的第 1,2 和 3 行序列
是这个

line1.*\s*(?!\s|line[13])line2.*\s*(?!\s|line[12])line3.*

解释

 line1 .* \s*             # line 1 plus newline(s)
(?! \s | line [13] ) # Next cannot be line 1 or 3 (or whitespace)
line2 .* \s* # line 2 plus newline(s)
(?! \s | line [12] ) # Next cannot be line 1 or 2 (or whitespace)
line3 .* # line 3

如果您想捕获该行内容,只需将捕获组放在(.*)

关于python - 如何提取匹配字符串之间的文本,包括匹配字符串和行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43124913/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com