gpt4 book ai didi

python - 在python中使用elementtree提取XML节点文本时出错

转载 作者:行者123 更新时间:2023-12-04 09:50:15 25 4
gpt4 key购买 nike

我尝试从特定节点中提取文本。我想从所有 person 中获得 id 值和 age 。在 person 10 中,年龄将是 30 ,可以在带有 name="age" 的属性文本中看到。但是,我最终收到一个错误(有关我的代码和由此产生的错误,请参见下文),即没有文本存在,我不明白为什么。

我之前已经对几乎相同的结构使用了相同的代码,并且它没有问题。如果有人能给我一个关于导致问题的原因的提示,我会很高兴。

XML 样式:

<population desc="Switzerland Baseline">
<person id="10">
<attributes>
<attribute name="age" class="java.lang.Integer" >30</attribute>
<attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
<attribute name="carAvail" class="java.lang.String" >never</attribute>
<attribute name="employed" class="java.lang.Boolean" >true</attribute>
<attribute name="hasLicense" class="java.lang.String" >no</attribute>
<attribute name="home_x" class="java.lang.Double" >2679482.0</attribute>
<attribute name="home_y" class="java.lang.Double" >1237545.0</attribute>
<attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
<attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
<attribute name="mzHeadId" class="java.lang.Long" >374775</attribute>
<attribute name="mzPersonId" class="java.lang.Long" >281604</attribute>
<attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
<attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
<attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
<attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
<attribute name="sex" class="java.lang.String" >f</attribute>
<attribute name="spRegion" class="java.lang.Integer" >1</attribute>
<attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000137</attribute>
<attribute name="statpopPersonId" class="java.lang.Long" >201240012081086</attribute>
</attributes>
<plan score="-9.025277777777776" selected="yes">
<activity type="home" link="270549" facility="home4" x="2679482.0" y="1237545.0" end_time="07:50:56" >
</activity>
</plan>

</person>

<!-- ====================================================================== -->

<person id="100">
<attributes>
<attribute name="age" class="java.lang.Integer" >3</attribute>
<attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
<attribute name="carAvail" class="java.lang.String" >never</attribute>
<attribute name="employed" class="java.lang.Boolean" >false</attribute>
<attribute name="hasLicense" class="java.lang.String" >no</attribute>
<attribute name="isCarPassenger" class="java.lang.Boolean" >true</attribute>
<attribute name="isOutside" class="java.lang.Boolean" >false</attribute>
<attribute name="mzHeadId" class="java.lang.Long" >324961</attribute>
<attribute name="mzPersonId" class="java.lang.Long" >-1</attribute>
<attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
<attribute name="ptHasHalbtax" class="java.lang.Boolean" >true</attribute>
<attribute name="ptHasStrecke" class="java.lang.Boolean" >true</attribute>
<attribute name="ptHasVerbund" class="java.lang.Boolean" >true</attribute>
<attribute name="sex" class="java.lang.String" >f</attribute>
<attribute name="spRegion" class="java.lang.Integer" >1</attribute>
<attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000049</attribute>
<attribute name="statpopPersonId" class="java.lang.Long" >201240013385042</attribute>
</attributes>
<plan score="0.0" selected="no">
<activity type="home" link="362038" facility="home27" x="2678781.0" y="1237314.0" >
</activity>
</plan>

<plan score="0.0" selected="yes">
<activity type="home" link="362038" facility="home27" x="2678781.0" y="1237314.0" >
</activity>
</plan>

</person>

<!-- ====================================================================== -->

<person id="1000">
<attributes>
<attribute name="age" class="java.lang.Integer" >48</attribute>
<attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
<attribute name="carAvail" class="java.lang.String" >never</attribute>
<attribute name="employed" class="java.lang.Boolean" >true</attribute>
<attribute name="hasLicense" class="java.lang.String" >yes</attribute>
<attribute name="home_x" class="java.lang.Double" >2678966.0</attribute>
<attribute name="home_y" class="java.lang.Double" >1235785.0</attribute>
<attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
<attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
<attribute name="mzHeadId" class="java.lang.Long" >137604</attribute>
<attribute name="mzPersonId" class="java.lang.Long" >496052</attribute>
<attribute name="ptHasGA" class="java.lang.Boolean" >false</attribute>
<attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
<attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
<attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
<attribute name="sex" class="java.lang.String" >f</attribute>
<attribute name="spRegion" class="java.lang.Integer" >1</attribute>
<attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000745</attribute>
<attribute name="statpopPersonId" class="java.lang.Long" >201240009138483</attribute>
</attributes>
<plan score="-437.00166666666667" selected="yes">
<activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="05:33:00" >
</activity>
<leg mode="transit_walk" dep_time="07:15:00" trav_time="00:01:01">
<route type="generic" start_link="812194" end_link="588385" trav_time="00:01:01" distance="73.45759253010056"></route>
</leg>
<activity type="pt interaction" link="588385" x="2682500.5564242266" y="1246491.125064118" max_dur="00:00:00" >
</activity>
<leg mode="pt" dep_time="07:16:01" trav_time="00:13:58">
<route type="enriched_pt" start_link="588385" end_link="368678" trav_time="00:13:58" distance="8378.187255109851">{"inVehicleTime":420.0,"transferTime":418.7853395582497,"accessStopIndex":4,"egressStopindex":5,"transitRouteId":"18221_002","transitLineId":"SBB_S2_8503016-8503225","departureId":"05362"}</route>
</leg>
<activity type="pt interaction" link="368678" x="2685173.595399507" y="1238953.4179927576" max_dur="00:00:00" >
</activity>
<leg mode="egress_walk" dep_time="07:30:00" trav_time="00:01:10">
<route type="generic" start_link="368678" end_link="812077" trav_time="00:01:10" distance="82.96796919207021"></route>
</leg>
<activity type="outside" link="812077" facility="outside_6" x="2685153.844294359" y="1239014.106373788" end_time="15:52:43" >
</activity>
<leg mode="outside" dep_time="15:52:43" trav_time="00:00:00">
<route type="generic" start_link="812077" end_link="812077" trav_time="00:00:00" distance="0.0"></route>
</leg>
<activity type="outside" link="812077" facility="outside_6" x="2685153.844294359" y="1239014.106373788" end_time="16:59:00" >
</activity>
<leg mode="transit_walk" dep_time="16:59:00" trav_time="01:42:47">
<route type="generic" start_link="812077" end_link="555704" trav_time="01:42:47" distance="7401.037993401233"></route>
</leg>
<activity type="outside" link="555704" facility="outside_7" x="2690699.2533230074" y="1240302.4760125757" end_time="17:07:39" >
</activity>
<leg mode="access_walk" dep_time="17:07:39" trav_time="00:33:33">
<route type="generic" start_link="555704" end_link="348266" trav_time="00:33:33" distance="2415.2684761259893"></route>
</leg>
<activity type="pt interaction" link="348266" x="2688841.9870530544" y="1240253.9986282045" max_dur="00:00:00" >
</activity>
<leg mode="pt" dep_time="17:41:12" trav_time="00:10:48">
<route type="enriched_pt" start_link="348266" end_link="166875" trav_time="00:10:48" distance="3166.770768054601">{"inVehicleTime":420.0,"transferTime":228.0,"accessStopIndex":0,"egressStopindex":10,"transitRouteId":"02828_023","transitLineId":"VZO_line961","departureId":"125106"}</route>
</leg>
<activity type="pt interaction" link="166875" x="2687161.005729228" y="1240076.9559941967" max_dur="00:00:00" >
</activity>
<leg mode="transit_walk" dep_time="17:52:00" trav_time="00:00:21">
<route type="generic" start_link="166875" end_link="771010" trav_time="00:00:21" distance="25.959922652207396"></route>
</leg>
<activity type="pt interaction" link="771010" x="2687180.6471416447" y="1240073.3528400902" max_dur="00:00:00" >
</activity>
<leg mode="pt" dep_time="17:52:21" trav_time="00:19:38">
<route type="enriched_pt" start_link="771010" end_link="955474" trav_time="00:19:38" distance="9742.201043728513">{"inVehicleTime":960.0,"transferTime":218.36673112316203,"accessStopIndex":1,"egressStopindex":7,"transitRouteId":"19622_002","transitLineId":"SBB_S16_8503016-8503103","departureId":"06187"}</route>
</leg>
<activity type="pt interaction" link="955474" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
</activity>
<leg mode="transit_walk" dep_time="18:12:00" trav_time="00:00:00">
<route type="generic" start_link="955474" end_link="955504" trav_time="00:00:00" distance="0.0"></route>
</leg>
<activity type="pt interaction" link="955504" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
</activity>
<leg mode="pt" dep_time="18:12:00" trav_time="00:07:00">
<route type="enriched_pt" start_link="955504" end_link="4223" trav_time="00:07:00" distance="3304.5168456795577">{"inVehicleTime":120.0,"transferTime":300.0,"accessStopIndex":2,"egressStopindex":3,"transitRouteId":"18221_002","transitLineId":"SBB_S2_8503016-8503225","departureId":"05406"}</route>
</leg>
<activity type="pt interaction" link="4223" x="2681934.8161827456" y="1247302.7661533705" max_dur="00:00:00" >
</activity>
<leg mode="transit_walk" dep_time="18:19:00" trav_time="00:00:59">
<route type="generic" start_link="4223" end_link="586407" trav_time="00:00:59" distance="71.92245024668337"></route>
</leg>
<activity type="pt interaction" link="586407" x="2681990.0107938214" y="1247298.9705903793" max_dur="00:00:00" >
</activity>
<leg mode="pt" dep_time="18:19:59" trav_time="01:01:00">
<route type="enriched_pt" start_link="586407" end_link="617712" trav_time="01:01:00" distance="15771.43292404094">{"inVehicleTime":1920.0,"transferTime":1740.0646247944242,"accessStopIndex":0,"egressStopindex":19,"transitRouteId":"07744_004","transitLineId":"PAG_line236","departureId":"77196"}</route>
</leg>
<activity type="pt interaction" link="617712" x="2679299.97008475" y="1237575.0077440983" max_dur="00:00:00" >
</activity>
<leg mode="egress_walk" dep_time="19:21:00" trav_time="00:15:42">
<route type="generic" start_link="617712" end_link="360294" trav_time="00:15:42" distance="1130.0689845763227"></route>
</leg>
<activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="17:53:00" >
</activity>
</plan>

</person>

<!-- ====================================================================== -->

<person id="1000157">
<attributes>
<attribute name="age" class="java.lang.Integer" >52</attribute>
<attribute name="bikeAvailability" class="java.lang.String" >FOR_ALL</attribute>
<attribute name="carAvail" class="java.lang.String" >always</attribute>
<attribute name="employed" class="java.lang.Boolean" >true</attribute>
<attribute name="hasLicense" class="java.lang.String" >yes</attribute>
<attribute name="home_x" class="java.lang.Double" >2695732.0</attribute>
<attribute name="home_y" class="java.lang.Double" >1259962.0</attribute>
<attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
<attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
<attribute name="mzHeadId" class="java.lang.Long" >275258</attribute>
<attribute name="mzPersonId" class="java.lang.Long" >212563</attribute>
<attribute name="ptHasGA" class="java.lang.Boolean" >false</attribute>
<attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
<attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
<attribute name="ptHasVerbund" class="java.lang.Boolean" >true</attribute>
<attribute name="sex" class="java.lang.String" >f</attribute>
<attribute name="spRegion" class="java.lang.Integer" >1</attribute>
<attribute name="statpopHouseholdId" class="java.lang.Long" >201202300043212</attribute>
<attribute name="statpopPersonId" class="java.lang.Long" >201240010759877</attribute>
</attributes>
<plan score="-1.7305555555555556" selected="yes">
<activity type="outside" link="557064" facility="outside_8" x="2691803.987049347" y="1253846.2689263367" end_time="07:04:33" >
</activity>
</plan>

</person>
</population>


我的代码:
import xml.etree.ElementTree as ET
import pandas as pd
import gzip


tree = ET.parse(gzip.open('STORAGE/500/1/output_plans.xml.gz', 'r'))

root = tree.getroot()
rows = []
for it in root.iter('person'):
id = it.attrib['id']
age = it.find('attributes/attribute[@name="age"]').text
rows.append([id, age])

d = pd.DataFrame(rows, columns=['id', 'age'])

错误:
AttributeError                            Traceback (most recent call last)
<ipython-input-2-badcde9dbf74> in <module>
8 for it in root.iter('person'):
9 id = it.attrib['id']
---> 10 age = it.find('attributes/attribute[@name="age"]').text
11 rows.append([id, age])
12

AttributeError: 'NoneType' object has no attribute 'text'

最佳答案

考虑迁移所有属性!

rows = []
for it in root.iter('person'):
attribute = it.find('attributes')

id_dict = {'id':it.attrib['id']}
attrs_dict = {a.attrib['name']:a.text for a in attribute.findall('attribute')}

# MERGE DICTIONARIES (ONLY WORKS Python 3.5+)
rows.append({**id_dict, **attrs_dict})

d = pd.DataFrame(rows)

print(d)
# id age bikeAvailability carAvail employed ... ptHasVerbund sex spRegion statpopHouseholdId statpopPersonId
# 0 10 30 FOR_SOME never true ... false f 1 201200010000137 201240012081086
# 1 100 3 FOR_SOME never false ... true f 1 201200010000049 201240013385042
# 2 1000 48 FOR_SOME never true ... false f 1 201200010000745 201240009138483
# 3 1000157 52 FOR_ALL always true ... true f 1 201202300043212 201240010759877

或者使用嵌套列表/字典理解!
attrs_list = [{**{'id':it.attrib['id']}, **{a.attrib['name']:a.text 
for a in it.find('attributes').findall('attribute')}}
for it in root.iter('person')]

d = pd.DataFrame(attrs_list)

print(d)
# id age bikeAvailability carAvail employed hasLicense ... ptHasStrecke ptHasVerbund sex spRegion statpopHouseholdId statpopPersonId
# 0 10 30 FOR_SOME never true no ... false false f 1 201200010000137 201240012081086
# 1 100 3 FOR_SOME never false no ... true true f 1 201200010000049 201240013385042
# 2 1000 48 FOR_SOME never true yes ... false false f 1 201200010000745 201240009138483
# 3 1000157 52 FOR_ALL always true yes ... false true f 1 201202300043212 201240010759877

关于python - 在python中使用elementtree提取XML节点文本时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62024004/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com