gpt4 book ai didi

Python - 如何嵌套文件读取循环?

转载 作者:太空狗 更新时间:2023-10-29 22:11:41 25 4
gpt4 key购买 nike

2 天前,我第一次接触到 Python(以及一般的编程)。今天我卡住了。我花了几个小时试图找到我怀疑是一个如此微不足道的问题的答案,没有其他人被困在这里:)

老板要我手动清理巨大的 .xml 文件,使其更易于阅读。我正在尝试创建一个脚本来为我做这件事。以下是 .xml 文件的示例以及我想要的输出。

输入(文件.xml):

<IssueTracking>
<Issue>
<SequenceNum>123</SequenceNum>
<Subject>Subject of Ticket 123</Subject>
<Description>Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.</Description>
</Issue>
<Issue>
<SequenceNum>124</SequenceNum>
<Subject>Subject of Ticket 124</Subject>
<Description>Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.</Description>
</Issue>
</IssueTracking>

期望的输出:

123    Subject of Ticket 123
Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.

124 Subject of Ticket 124
Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.

这是我到目前为止所得到的。

with open(File.xml, 'r') as SourceFile: # Opens the file
while 1: # Keep going through the file to the end
SourceFileLine = SourceFile.readline() # Saves lines of the source file
if not SourceFileLine: # Skip empty lines
break

SourceFileLine = SourceFileLine.strip() # Strips the whitespace

if "<SequenceNum>" in SourceFileLine:
SequenceNum = SourceFileLine[13:-14] # Trims the tags, saves the field.
continue

if "<Subject>" in SourceFileLine:
Subject = SourceFileLine[9:-10]
continue

#if "<Description>" in SourceFileLine:
# last_pos = SourceFile.tell()
# while "</Description>" not in SourceFileLine:
# SourceFile.seek(last_pos)
# ?????
#
# Description = Description[22:]
# continue

if "</Issue>" in SourceFileLine:
print(SequenceNum, end = "\t")
print(Subject)
# print(Description)
print("\n")

我一直在识别和保留 <Description> 之间的那三行标记为单个字符串,我可以在继续查看源文件之前打印出来。现在已经扫描了数十个文件行读取循环的其他示例,我怀疑我需要的是标记我到达目标字段的点并在文件中的该点嵌套另一个读取循环。但是我还没有找到另一个这样做的例子,所以我假设我缺少一些基本的东西或者有更好的方法。在此先感谢您的帮助!

最佳答案

我强烈推荐使用 lxml 来处理您的数据的示例。 (注意:为 Py2.x 编写但很容易适应 Py3.x)

from lxml import etree
xml = """<IssueTracking>
<Issue>
<SequenceNum>123</SequenceNum>
<Subject>Subject of Ticket 123</Subject>
<Description>Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.</Description>
</Issue>
<Issue>
<SequenceNum>124</SequenceNum>
<Subject>Subject of Ticket 124</Subject>
<Description>Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.</Description>
</Issue>
</IssueTracking>
"""

root = etree.fromstring(xml)
for issue in root.findall('Issue'):
as_list = [issue.find(n).text for n in ('SequenceNum', 'Subject', 'Description')]
as_list[2] = as_list[2].split('\n')
print as_list

打印:

['123', 'Subject of Ticket 123', ['Line 1 in Description field of Ticket 123.', 'Line 2 in Description field of Ticket 123.', 'Line 3 in Description field of Ticket 123.']]
['124', 'Subject of Ticket 124', ['Line 1 in Description field of Ticket 124.', 'Line 2 in Description field of Ticket 124.', 'Line 3 in Description field of Ticket 124.']]

关于Python - 如何嵌套文件读取循环?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11585688/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com