gpt4 book ai didi

Python 从字母中提取文本 - 索引

转载 作者:太空宇宙 更新时间:2023-11-03 21:33:24 24 4
gpt4 key购买 nike

我想用Python从txt文件中提取信件的特定部分。开头和结尾由清晰的开头/结尾表达式(letter_begin/letter_end)标记。我的问题是,文本的“记录”需要从 letter_begin 列表中任何项目的第一次出现开始,并在 letter_end 列表中的最后一个项目(+3 行缓冲区)结束。我想将输出文本写入文件。这是我到目前为止的示例文本和代码:

sample_text = """Some random text right here 
.........
Dear Shareholders: We are pleased to provide this report to our shareholders and fellow shareholders. we thank you for your continued support.
Best regards,
Douglas - Director


Other random text in this lines """

letter_begin = ["dear", "to our shareholders", "fellow shareholders"]
letter_end = ["best regards", "respectfully submitted", "thank you for your continued support"]

with open(filename, 'r', encoding="utf-8") as infile, open(xyz.txt, mode = 'w', encoding="utf-8") as f:
text = infile.read()
lines = text.strip().split("\n")
target_start_idx = None
target_end_idx = None
for index, line in enumerate(lines):
line = line.lower()
if any(beg in line for beg in letter_begin):
target_start_idx = index
continue
if any(end in line for end in letter_end):
target_end_idx = index + 3
break


if target_start_idx is not None:
target = "\n".join(lines[target_start_idx : target_end_idx])
f.write(str(target))

我想要的输出应该是:

output = "Dear Shareholders: We are pleased to provide this report to our shareholders and fellow shareholders. we thank you for your continued support.
Best regards,
Douglas - Director

"

最佳答案

您的循环为您提供了最后出现的开始序列。

您应该将读取部分分成两个循环,如下所示:

with open(filename, 'r', encoding="utf-8") as infile:

text = infile.read()
lines = text.strip().split("\n")
target_start_idx = None
target_end_idx = None
for index, line in enumerate(lines):
line = line.lower()
if any(beg in line for beg in letter_begin):
target_start_idx = index
break
for index, line in enumerate(lines):
if any(end in line for end in letter_end):
target_end_idx = index + 3
continue

这样,当出现第一次开始序列时,您就退出循环。

关于Python 从字母中提取文本 - 索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53394453/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com