gpt4 book ai didi

python - 使用 Python 从 XML 获取数据

转载 作者:太空宇宙 更新时间:2023-11-03 14:19:54 24 4
gpt4 key购买 nike

我试图了解如何使用 Python 从 XML 文件中提取某些数据。

目前,我正在从 API 中提取信息并获取 XML 文件,但我想直接从 XML 中获取特定信息。

据我所知,元素树似乎是答案,但我发现它很难理解,并且我真的不确定这是创建解决方案的正确方法。

我在下面留下了用于获取 XML 数据的代码,以及它给我的一个缩短的 XML 文件(只留下了我需要提取的重要部分)。

谢谢。

import requests


#Import routes
routes=[]



class routesClass:
def __init__(self,name,url):#,start,end,offset,rwe,al):
self.n=name
self.u=url
#self.s=start
#self.e=end
#self.o=offset
#self.r=rwe
#self.a=al

#Add example route
testRoute1=routesClass("EasternFwy-Hoddle/Johnston","https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.7999012967757,144.99318476311566:?routeType=shortest&key=SECRETKEY&computeTravelTimeFor=all")
routes.append(testRoute1)
#routes.append(testRoute2)

print(routes[0].u)

还有 XML 的东西。

<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<leg>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>

最佳答案

我推荐lxml。在我看来,浏览 xml 树比浏览元素树更容易。 。这是demo如何使用该模块。

示例
获取您的 xml,这就是我使用 lxml 解析它的方式。如果保存 example.xml 和 xmlparse.py 的代码

example.xml - 您提供的 XML 格式不正确。

  • 它没有将两个摘要部分分组的父 xml 标记。
  • 有一个随机 <leg>标签位于两个摘要部分的中间。

这两个问题不允许它解析,所以我删除了 <leg>标记并将两个摘要部分分组在 <parent> 中标签。这是 XML。

<parent>
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
</parent>

xmlparse.py - 在此脚本中,我为您提供了一个打印出键 (elem.text) 和值 (text) 的循环,以及一个检查其中一个键是否存在的逻辑语句存在且其值大于 700。这只是为了帮助您了解如何在循环中添加触发器。

from lxml import etree

def parseXML(xmlFile):
"""
Parse the xml
"""
with open(xmlFile) as fobj:
xml = fobj.read()

root = etree.fromstring(xml)

for appt in root.getchildren():
for elem in appt.getchildren():
if not elem.text:
text = "None"
else:
text = elem.text

##This is doing something with the xml based on it's tag and value.
if elem.tag == 'travelTimeInSeconds' and int(text) > 700:
print('******** Do something with ', elem.tag, ' : ', text)
print(elem.tag + " => " + text)

if __name__ == "__main__":
parseXML("example.xml")

输出 -- 如果您保存 xmlparse.py 的代码并保存我在 example.xml 文件中提供的更新的 xml,则在运行脚本时您将收到以下输出:

lengthInMeters => 5144
******** Do something with travelTimeInSeconds : 764
travelTimeInSeconds => 764
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:54:58+11:00
noTrafficTravelTimeInSeconds => 478
historicTrafficTravelTimeInSeconds => 764
liveTrafficIncidentsTravelTimeInSeconds => 764
lengthInMeters => 806
travelTimeInSeconds => 67
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:43:21+11:00
noTrafficTravelTimeInSeconds => 59
historicTrafficTravelTimeInSeconds => 67
liveTrafficIncidentsTravelTimeInSeconds => 67

关于python - 使用 Python 从 XML 获取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48001711/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com