gpt4 book ai didi

Python读取大型xml文件并保存到csv文件

转载 作者:行者123 更新时间:2023-12-01 08:22:10 24 4
gpt4 key购买 nike

我有一个像下面结构的大 xml 文件

<?xml version="1.0"?>
<products xmlns="http://data-vocabulary.org/product/">
<channel>
<title>Online Store</title>
<link>https://www.clienturl.com/</link>
<product>
<identifier>DI035AT12JNR</identifier>
<quantity>1</quantity>
<fn>Button Fastening Mid Rise Boyfriend Jeans</fn>
<description>Button Fastening Mid Rise Boyfriend Jeans</description>
<category>women-clothing &gt; women-clothing-jeans &gt; women-clothing-jeans-straight_jeans</category>
<currency>SAR</currency>
<photo>http://clienturl/product/78/6014/v1/1-zoom.jpg</photo>
<brand>Diesel</brand>
<url>https://eclient-product-url.html</url>
<price>1450</price>
<google_product_category>Apparel &amp; Accessories &gt; Clothing &gt; Pants</google_product_category>
</product>
<product>
<identifier>DI035AT12JNR</identifier>
<quantity>1</quantity>
<fn>Button Fastening Mid Rise Boyfriend Jeans</fn>
<description>Button Fastening Mid Rise Boyfriend Jeans</description>
<category>women-clothing &gt; women-clothing-jeans &gt; women-clothing-jeans-straight_jeans</category>
<currency>SAR</currency>
<photo>http://clienturl/product/78/6014/v1/1-zoom.jpg</photo>
<brand>Diesel</brand>
<url>https://eclient-product-url.html</url>
<price>1450</price>
<google_product_category>Apparel &amp; Accessories &gt; Clothing &gt; Pants</google_product_category>
</product>
</channel>
</products>

这是下面的Python代码

   import codecs
import xml.etree.ElementTree as etree
xmlfile = 'en-sa.xml'

def iterate_xml(xmlfile):
doc = etree.iterparse(xmlfile, events=('start', 'end'))
_, root = next(doc)
start_tag = None
for event, element in doc:
if event == 'start' and start_tag is None:
start_tag = element.tag
if event == 'end' and element.tag == start_tag:
yield element
start_tag = None
root.clear()

count=0
for element in iterate_xml(xmlfile):
for ele in element:
print ele
count=count+1
if count == 5:
break

打印输出如下

<Element '{http://data-vocabulary.org/product/}title' at 0x7efd046f7a10>
<Element '{http://data-vocabulary.org/product/}link' at 0x7efd046f7ad0>
<Element '{http://data-vocabulary.org/product/}product' at 0x7efd046f7d10>
<Element '{http://data-vocabulary.org/product/}product' at 0x7efd04703050>

我想将此 xml 放入 csv 文件,就像下面的 cloumns 标题

identifier:quantity:fn:description:category:currency:photo:brand:url:price:google_product_category

但没有任何想法如何继续,有人可以帮助我吗\提前致谢

最佳答案

建议使用 lxml.etree 提取此实例的所有文本,它返回包含所有文本和尾部的字符串列表。

import lxml.etree
text = """<?xml version="1.0"?>
<products xmlns="http://data-vocabulary.org/product/">
<channel>
<title>Online Store</title>
<link>https://www.clienturl.com/</link>
<product>
<identifier>DI035AT12JNR</identifier>
<quantity>1</quantity>
<fn>Button Fastening Mid Rise Boyfriend Jeans</fn>
<description>Button Fastening Mid Rise Boyfriend Jeans</description>
<category>women-clothing &gt; women-clothing-jeans &gt; women-clothing-jeans-straight_jeans</category>
<currency>SAR</currency>
<photo>http://clienturl/product/78/6014/v1/1-zoom.jpg</photo>
<brand>Diesel</brand>
<url>https://eclient-product-url.html</url>
<price>1450</price>
<google_product_category>Apparel &amp; Accessories &gt; Clothing &gt; Pants</google_product_category>
</product>
<product>
<identifier>DI035AT12JNR</identifier>
<quantity>1</quantity>
<fn>Button Fastening Mid Rise Boyfriend Jeans</fn>
<description>Button Fastening Mid Rise Boyfriend Jeans</description>
<category>women-clothing &gt; women-clothing-jeans &gt; women-clothing-jeans-straight_jeans</category>
<currency>SAR</currency>
<photo>http://clienturl/product/78/6014/v1/1-zoom.jpg</photo>
<brand>Diesel</brand>
<url>https://eclient-product-url.html</url>
<price>1450</price>
<google_product_category>Apparel &amp; Accessories &gt; Clothing &gt; Pants</google_product_category>
</product>
</channel>
</products>""".encode('utf-8')# the library wants bytes so we encode
# Not needed if reading from a file
doc = lxml.etree.fromstring(text)
print(doc.xpath('//text()'))

将以字符串列表的形式输出 XML 中的所有文本

['\n   ', '\n   ', 'Online Store', '\n   ', 'https://www.clienturl.com/', '   \n   ', '\n   ', 'DI035AT12JNR', '\n   ', '1', '\n   ', 'Button Fastening Mid Rise Boyfriend Jeans', '\n   ', 'Button Fastening Mid Rise Boyfriend Jeans', '\n  ', 'women-clothing > women-clothing-jeans > women-clothing-jeans-straight_jeans', '\n  ', 'SAR', '\n  ', 'http://clienturl/product/78/6014/v1/1-zoom.jpg', '\n  ', 'Diesel', '\n  ', 'https://eclient-product-url.html', '\n  ', '1450', '\n  ', 'Apparel & Accessories > Clothing > Pants', '\n', '\n', '\n  ', 'DI035AT12JNR', '\n  ', '1', '\n  ', 'Button Fastening Mid Rise Boyfriend Jeans', '\n  ', 'Button Fastening Mid Rise Boyfriend Jeans', '\n  ', 'women-clothing > women-clothing-jeans > women-clothing-jeans-straight_jeans', '\n  ', 'SAR', '\n  ', 'http://clienturl/product/78/6014/v1/1-zoom.jpg', '\n  ', 'Diesel', '\n  ', 'https://eclient-product-url.html', '\n  ', '1450', '\n  ', 'Apparel & Accessories > Clothing > Pants', '\n  ', '\n  ', '\n  ']

无法保证在迭代整个 XML 代码时这一方法能够正常工作,因为您只提供了一个示例。但是,如果 XML 中的类别数量是标准的,您可以按产品迭代并选择所需的索引以添加到另一个列表。一旦你有了包含 (identifier:quantity:fn:description:category:currency:photo:brand:url:price:google_product_category) 的列表,使用 pandas.DataFrame.append 创建 pandas 数据框应该很容易。并导出到 csv df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv')

关于Python读取大型xml文件并保存到csv文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54560979/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com