gpt4 book ai didi

python - 读取 XML 文件标签

转载 作者:太空宇宙 更新时间:2023-11-03 19:26:50 24 4
gpt4 key购买 nike

我想读取标签值,例如 <title> , <title_id>来自 xml 文件。 <title>的值读取成功。是否可以阅读 <title> , <title_id>具有相同的循环?
请帮助我,我是 XML 新手。

        <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/ http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5" xml:lang="en">
<siteinfo>
<sitename>Wiki</sitename>
<case>first-letter</case>
<namespaces>
<namespace key="0" case="first-letter" />
</namespaces>
</siteinfo>
<page>
<title>Sex</title>
<title_id>31239628</title_id>
<revision>
<id>437708703</id>
<timestamp>2011-07-04T13:53:52Z</timestamp>
<text xml:space="preserve" bytes="6830">{{ Hello}}

</text>
</revision>
</page>
</mediawiki>

我正在使用以下代码从文件中读取所有标题。而且工作正常。

import xml.etree.cElementTree as etree
tree = etree.parse('find_title.xml')
for value in tree.getiterator(tag='title'):
print value.text

最佳答案

如果您要经常使用 XML,我建议您熟悉 XPATH .

这是使用我的首选 XML 库 lxml 的快速片段。

from lxml import etree

doc = etree.XML("""
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/ http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5" xml:lang="en">
<siteinfo>
<sitename>Wiki</sitename>
<case>first-letter</case>
<namespaces>
<namespace key="0" case="first-letter" />
</namespaces>
</siteinfo>
<page>
<title>Sex</title>
<title_id>31239628</title_id>
<revision>
<id>437708703</id>
<timestamp>2011-07-04T13:53:52Z</timestamp>
<text xml:space="preserve" bytes="6830">{{ Hello}}
</text>
</revision>
</page>
</mediawiki>
""")

def first(seq,default=None):
for item in seq:
return item
return default

NSMAP=dict(mw="http://www.mediawiki.org/xml/export-0.5/")

print first(doc.xpath('/mw:mediawiki/mw:page/mw:title/text()',namespaces=NSMAP))
print first(doc.xpath('/mw:mediawiki/mw:page/mw:title_id/text()',namespaces=NSMAP))

产量:

Sex31239628

Update - supposing multiple page elements

XPATH queries mostly return node sequences (hence the first function).

You could use a single query that returned the values of both tags for all of the pages. You would then have to group them together, if a subelement was missing from a page you'd be out of step. You could write the query to ensure the subelements existed, but you might want to know that there was a partial record, etc, etc.

So my first answer to this would be to loop through the pages like so:

for i,page in enumerate(doc.xpath('/mw:mediawiki/mw:page',namespaces=NSMAP)):
title = first(page.xpath('./mw:title/text()',namespaces=NSMAP))
title_id = first(page.xpath('./mw:title_id/text()',namespaces=NSMAP))
print "Page %s: %s (%s)" % (i,title,title_id)

产量:

Page 0: Sex (31239628)

关于python - 读取 XML 文件标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7819655/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com