gpt4 book ai didi

Python递归执行try except while条件满足

转载 作者:行者123 更新时间:2023-12-01 07:22:10 25 4
gpt4 key购买 nike

我想逐行迭代一个文本文件并搜索模式并从中提取实体。但是,提取的几个模式具有多行特征,当我逐行迭代它时,这些特征会丢失。

现在,我正在使用 try- except block 并将下一行附加到当前行,如下所示:

try:
id_value, utterance, prediction = process(line + ' ' + lines[n + 1])
except AttributeError:
# Handle bad data
try:
id_value, utterance, prediction = process(line + ' ' + lines[n + 1] + ' ' + lines[n + 2])
except AttributeError:
# Handle bad data
try:
id_value, utterance, prediction = process(
line + ' ' + lines[n + 1] + ' ' + lines[n + 2] + ' ' + lines[n + 3])
<小时/>

这是数据:

数据.txt

[22 Aug 2019 13:25:12] [ID:9ea1566460506294]     INFO [139921763325696] (ModelClassification:056) - Model classification for utterance_1 is 1
[22 Aug 2019 13:26:06] [ID:7ea1566460117776] INFO [139921771718400] (ModelClassification:056) - Model classification for utterance_2
is 1
[22 Aug 2019 13:26:16] [ID:71d1566460492762] INFO [139921771718400] (ModelClassification:056) - Model classification for utterance_3 is 0
<小时/>

如你所见

[22 Aug 2019 13:26:06] [ID:7ea1566460117776]     INFO [139921771718400] (ModelClassification:056) - Model classification for  utterance_2
is 1

逐行迭代时扩展 2 行。

代码

import re

matching_string = 'Model classification for'
id_start_string = '[ID:'
id_end_string = ']'


def process(line):
start_idx = line.find(id_start_string)
end_idx = [s.start() for s in re.finditer(id_end_string, line)]
for end in end_idx:
if end > start_idx:
# Get first index greater than start string index
end_idx = end
break
id_value = line[start_idx + len(id_start_string): end_idx]
groups = re.search('Model classification for (.*) is (0|1)', line).groups()
utterance = groups[0]
prediction = groups[1]
return id_value, utterance, prediction


with open('data.txt', 'r') as f:
lines = f.read().splitlines()
for n, line in enumerate(lines):
# Search for pattern in string
if matching_string in line:
try:
id_value, utterance, prediction = process(line)
except AttributeError:
print('Bad data')
print(line)
print(id_value, utterance, prediction)

可以用递归的方式解决我的问题吗?非常感谢任何帮助。

编辑 -

lines = ['22 Aug 2019 13:25:12] [ID:9ea1566460506294]     INFO [139921763325696] (ModelClassification:056) - Model classification for utterance_1 is 1', '[22 Aug 2019 13:26:06] [ID:7ea1566460117776]     INFO [139921771718400] (ModelClassification:056) - Model classification for  utterance_2', ' is 1', '[22 Aug 2019 13:26:16] [ID:71d1566460492762]     INFO [139921771718400] (ModelClassification:056) - Model classification for utterance_3 is 0 ']

最佳答案

如果你想在文件中查找一行。您可以使用 re.findall() 来实现

import re
with open("input.txt", "r") as f:
text = f.read()

output = re.findall(r'some regex pattern', text)
output1 = re.findall(r'some other pattern', text)
output2 = re.findall(r'another pattern', text)

with open("output.txt", "w") as f:
f.write(output)
f.write(output1)
f.write(output2)

如果你想递归地执行它,你可以,但 re.findall 听起来像你需要的。

关于Python递归执行try except while条件满足,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57646143/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com