gpt4 book ai didi

python - 如何使用 python 从文件夹中读取 xml 文件?

转载 作者:太空宇宙 更新时间:2023-11-04 10:28:05 25 4
gpt4 key购买 nike

我有一个这样的 XML 文件:

xml_='''\
<author type="XXX" language="EN" gender="xx" feature="xx" web="foobar.com">
<documents count="N">
<document KEY="e95a9a6c790ecb95e46cf15bee517651" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
</document>
<document KEY="bc360cfbafc39970587547215162f0db" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
</document>
<document KEY="19e71144c50a8b9160b3f0955e906fce" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
</document>
<document KEY="21d4af9021a174f61b884606c74d9e42" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
</document>
<document KEY="28a45eb2460899763d709ca00ddbb665" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
</document>
<document KEY="a0c0712a6a351f85d9f5757e9fff8946" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
</document>
<document KEY="626726ba8d34d15d02b6d043c55fe691" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
</document>
<document KEY="2cb473e0f102e2e4a40aa3006e412ae4" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...] [...]
]]>
</document>
</documents>
</author>
'''

然后我将它放入 pandas 数据框中,如下所示:

import pandas as pd
import xml.etree.ElementTree as ET

def iter_docs(author):
author_attr = author.attrib
for doc in author.iterfind('.//document'):
doc_dict = author_attr.copy()
doc_dict.update(doc.attrib)
doc_dict['data'] = doc.text
yield doc_dict


etree = ET.fromstring(xml_data) #create an ElementTree object
doc_df = pd.DataFrame(list(iter_docs(etree)))

我只想传递文件的路径而不是创建 xml_data 字符串变量,知道如何做到这一点吗?

最佳答案

来自文档:https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

你可以这样做:

etree = ET.parse(filename)
root = etree.getroot()
doc_df = pd.DataFrame(list(iter_docs(root)))

关于python - 如何使用 python 从文件夹中读取 xml 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28289187/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com