gpt4 book ai didi

Python XML 解析器

转载 作者:太空宇宙 更新时间:2023-11-04 08:06:44 27 4
gpt4 key购买 nike

我有一个复杂的 XML 需要解析。我知道如何解析一些重要的标签。

XML 数据

<staff gid="2027930674">
<task>Director</task>
<person id="103045">Yōjirō Arai</person>
</staff>

XML 完整数据

<ann>
<anime id="16989" gid="1524403706" type="movie" name="Taifū no Noruda" precision="movie" generated-on="2015-04-27T08:05:39Z">
<info gid="1917137337" type="Picture" src="http://cdn.animenewsnetwork.com/thumbnails/fit200x200/encyc/A16989-1917137337.1429892764.jpg" width="141" height="200">
<img src="http://cdn.animenewsnetwork.com/thumbnails/hotlink-fit200x200/encyc/A16989-1917137337.1429892764.jpg" width="141" height="200"/>
<img src="http://cdn.animenewsnetwork.com/thumbnails/hotlink-max500x600/encyc/A16989-1917137337.1429892764.jpg" width="353" height="500"/>
</info>
<info gid="1994323462" type="Main title" lang="JA">Taifū no Noruda</info>
<info gid="1715491679" type="Alternative title" lang="JA">台風のノルダ</info>
<info gid="898837990" type="Plot Summary">
On a certain isolated island, at a certain middle school, on the eve of the culture festival, Shūichi Azuma quits baseball after playing his whole life. He has a fight with his best friend Kenta Saijō. Then they suddenly meet a mysterious, red-eyed girl named Noruda, and a huge typhoon hits the middle school.
</info>
<info type="Vintage">2015-06-05</info>
<info gid="2492283870" type="Premiere date">2015-06-05 (Japan)</info>
<info gid="2453949568" type="Ending Theme">
"Arashi no Ato de" (嵐のあとで; After the Storm) by Galileo Galilei
</info>
<info gid="3199882585" type="Official website" lang="JA" href="http://typhoon-noruda.com/">「台風のノルダ」公式サイト</info>
<news datetime="2015-04-09T17:20:00Z" href="http://www.animenewsnetwork.com:/news/2015-04-09/studio-colorido-unveils-typhoon-noruda-anime-film/.86937">
Studio Colorido Unveils <cite>Typhoon Noruda</cite> Anime Film
</news>
<news datetime="2015-04-24T08:00:00Z" href="http://www.animenewsnetwork.com:/news/2015-04-24/studio-colorido-taifu-no-noruda-film-unveils-cast-more-staff-theme-song-band/.87470">
Studio Colorido's <i>Taifū no Noruda</i> Film Unveils Cast, More Staff, Theme Song Band
</news>
<staff gid="2027930674">
<task>Director</task>
<person id="103045">Yōjirō Arai</person>
</staff>
<staff gid="3870106504">
<task>Music</task>
<person id="110581">Masashi Hamauzu</person>
</staff>
<staff gid="2732633345">
<task>Character Design</task>
<person id="135767">Hiroyasu Ishida</person>
</staff>
<staff gid="1532205853">
<task>Art Director</task>
<person id="52564">Mika Nishimura</person>
</staff>
<staff gid="1006708772">
<task>Animation Director</task>
<person id="135767">Hiroyasu Ishida</person>
</staff>
<staff gid="934584477">
<task>Sound Director</task>
<person id="8849">Satoshi Motoyama</person>
</staff>
<staff gid="1138447906">
<task>Cgi Director</task>
<person id="42135">Norihiko Miyoshi</person>
</staff>
<staff gid="3178797981">
<task>Director of Photography</task>
<person id="24382">Mitsuhiro Sato</person>
</staff>
<cast gid="2645091588" lang="JA">
<role>Shūichi Azuma</role>
<person id="135769">Shūhei Nomura</person>
</cast>
<cast gid="2397297323" lang="JA">
<role>Kenta Saijō</role>
<person id="135770">Daichi Kaneko</person>
</cast>
<cast gid="2417172290" lang="JA">
<role>Noruda</role>
<person id="135771">Kaya Kiyohara</person>
</cast>
<credit gid="2574178211">
<task>Animation Production</task>
<company id="13518">Studio Colorido</company>
</credit>
</anime>
</ann>

Python代码

#! /usr/bin/Python

# Import xml parser.
import xml.etree.ElementTree as ElementTree

# Import url library.
from urllib.request import urlopen

# Import sys library.
import sys

# XML to parse.
sampleUrl = "http://cdn.animenewsnetwork.com/encyclopedia/api.xml?anime="

# Get the number of params we have in our application.
params = len (sys.argv)

# Check the number of params we have.
if (params == 1):
print ("We need at least 1 anime identifier.")
else:
for aid in range (1, params):
# Read the xml as a file.
content = urlopen (sampleUrl + sys.argv[aid])

# XML content is stored here to start working on it.
xmlData = content.readall().decode('utf-8')

# Close the file.
content.close()

# Start parsing XML.
root = ElementTree.fromstring (xmlData)

# Extract classic data.
for info in root.iter("anime"):
print ("Id: " + info.get("id"))
print ("Gid: " + info.get("gid"))
print ("Name: " + info.get("name"))
print ("Precision: " + info.get("precision"))
print ("Type: " + info.get("type"))

# Extract date and general poster.
for info in root.iter ("info"):
if ("Vintage" in info.get("type")):
print ("Date: " + info.text)

if ("Picture" in info.get("type")):
print ("Poster: " + info.get("src"))

# Extract aditional posters.
for img in root.iter ("img"):
print ("Poster: " + img.get("src"))

print ("")

# Extract all the staff of this anime.
result = {}
for staff in root.getiterator ("staff"):
# Initialize values.
task = ""
value = {}

for elem in staff.getchildren():
if elem.tag == "task" :
task = elem.text
elif elem.tag == "person" :
tmp = elem.text

if "id" in tmp:
value["id"] = tmp["id"]
value["name"] = elem.text
if task :
result[task] = value
print (result)

我正在使用 xml.etree.ElementTree 来解析整个 XML。但是我无法将此部分作为一个元素来解析。我需要将所有数据作为一个字段存储在另一个数据库中。

我需要所有这些数据来实现这一点。

示例:{ "Director": {"Name": "Yojiro Arai", "id": "103045} }

我不知道如何使用库 ElementTree

感谢您的帮助。

最佳答案

  1. 通过 xml.etree.ElementTree 模块解析输入的 XML。
  2. 通过 getiterator 从解析器对象中迭代每个 staff 标签。
  3. 通过 getchildren() 迭代 staff 标签的每个子元素。
  4. 创建词典。

演示:

import xml.etree.ElementTree as PARSER

data = """
<xml>
<staff gid="2027930674">
<task>Director</task>
<person id="103045">ABC</person>
</staff>
<staff gid="2027930674">
<task>Director1</task>
<person id="1030452">XYZ</person>
</staff>
</xml>
"""

root = PARSER.fromstring(data)
result = {}
for i in root.getiterator("staff"):
key = ""
value = {}
for j in i.getchildren():
if j.tag=="task":
key = j.text
elif j.tag=="person":
tmp = j.attrib
if "id" in tmp:
value["id"] = tmp["id"]
value["name"] = j.text

if key:
result[key] = value

print result

输出:

{'Director': {'id': '103045', 'name': 'ABC'}, 'Director1': {'id': '1030452', 'name': 'XYZ'}}

关于Python XML 解析器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29894169/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com