gpt4 book ai didi

python - 通过 XML 解析时记录被删除

转载 作者:行者123 更新时间:2023-12-01 00:19:25 25 4
gpt4 key购买 nike

我正在将 XML 解析为 Pandas DF,但在执行此操作时我丢失了记录。并非所有记录都具有所有属性。在这种情况下,我注意到记录(DF 中的行)已从 DF 中删除,而不是被替换为“None”。

有办法缓解这种情况吗?我似乎找不到解决方案。

我粘贴了下面的代码作为引用:

import xml.etree.ElementTree as et
import pandas as pd

tree = et.parse('20191125_DMG_PI.xml')
root = tree.getroot()

df_cols = ["status",
"priref",
"full_name",
"achternaam",
"geboorteplaats",
"sterfplaats",
"detail",
"adres",
"zip",
"note",
"gender"]
rows = []

for record in root:
for child in record:
s_priref = ""
s_priref = child.get('priref')
for child in record:
s_name_note = ""
s_name_note = child.get('name.note')
for child in record:
s_surname = ""
s_surname = child.find('surname')

for field in child.findall('Address'):
s_adress = ""
s_address = field.find('address').text if field is not None else None
for field in child.findall('Address'):
s_zip = ""
s_zip = field.find('address.postal_code').text if field is not None else None
for field in child.findall('name'):
s_full_name = ""
s_full_name = field.find('value').text if field is not None else None
for field in child.findall('name.status'):
s_status = ""
s_status = field.find('value').text if field is not None else None
for field in child.findall('level_of_detail'):
s_detail = ""
s_detail = field.tag + ": " + field.find('value').text if field is not None else None
for field in child.findall('gender'):
s_gender = ""
s_gender = field.find('value').text

for field in child.findall('birth.place'):
s_gbp = ""
s_gbp = field.find('value').text if field is not None else None
for field in child.findall('death.place'):
s_pvo = ""
if len(field.findall('death.place')) == 0:
s_pvo = "NaN"
else:
s_pvo = field.find('value').text if field is not None else None

rows.append({"status": s_status,
"priref": s_priref,
"full_name": s_full_name,
"achternaam": s_surname,
"geboorteplaats": s_gbp,
"sterfplaats": s_pvo,
"detail": s_detail,
"adres": s_address,
"zip": s_zip,
"note": s_name_note,
"gender": s_gender
})

out_df = pd.DataFrame(rows, columns=df_cols)
print(out_df)

前三条记录如下所示:

<recordList><record priref="530000001" creation="2014-06-23T11:36:18" modification="2019-09-13T09:07:12">
<name>
<value lang="">C.I.A.P.</value>
</name>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">INST</value>
<value lang="0">institution</value>
<value lang="1">instelling</value>
<value lang="2">institution</value>
<value lang="3">Institution</value>
<value lang="4">المؤسسة</value>
<value lang="5">istituto</value>
<value lang="6">οργανισμός</value>
</name.type>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<Address>
<address>Lombaardstraat 23</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Hasselt</value>
</address.place>
<address.postal_code>3500</address.postal_code>
<address.type />
</Address>
<level_of_detail>
<value lang="neutral">PARTIAL</value>
<value lang="0">partial</value>
<value lang="1">partieel</value>
<value lang="2">partiel</value>
<value lang="3">partiell</value>
<value lang="5">parziale</value>
</level_of_detail>
<birth.place>
<value lang="">Hasselt</value>
</birth.place>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<name.note>Centrum voor Informatie en Aktueel Prentenkabinet</name.note>
<Place_activity>
<place_activity.institution />
<place_activity.type />
<place_activity>
<value lang="">Hasselt</value>
</place_activity>
<place_activity.notes />
<place_activity.date.end />
<place_activity.date.start />
</Place_activity>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-09-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:07:12</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:15:16</edit.time>
</Edit>
</record><record priref="530000003" creation="2014-06-23T11:36:18" modification="2019-09-13T09:02:51">
<name>
<value lang="">Goossens, K.</value>
</name>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">PERSON</value>
<value lang="0">person</value>
<value lang="1">persoon</value>
<value lang="2">personne</value>
<value lang="3">Person</value>
<value lang="4">إسم شخص</value>
<value lang="5">persona</value>
<value lang="6">πρόσωπο</value>
</name.type>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<surname>Goossens</surname>
<Address>
<address>Morckhovelei</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Borgerhout</value>
</address.place>
<address.postal_code />
<address.type />
</Address>
<nationality>
<value lang="">Belgisch</value>
</nationality>
<level_of_detail>
<value lang="neutral">PARTIAL</value>
<value lang="0">partial</value>
<value lang="1">partieel</value>
<value lang="2">partiel</value>
<value lang="3">partiell</value>
<value lang="5">parziale</value>
</level_of_detail>
<forename>K.</forename>
<gender>
<value lang="neutral">FEMALE</value>
<value lang="0">female</value>
<value lang="1">vrouw</value>
<value lang="2">femme</value>
<value lang="3">weiblich</value>
<value lang="5">femmina</value>
<value lang="6">θηλυκό</value>
</gender>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-09-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:02:51</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:21:05</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:20:03</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:19:45</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:19:16</edit.time>
</Edit>
</record><record priref="530000004" creation="2014-06-23T11:36:18" modification="2019-07-19T09:55:26">
<name>
<value lang="">De Bruyne, Pieter</value>
</name>
<name.type>
<value lang="neutral">MAKER</value>
<value lang="0">creator</value>
<value lang="1">vervaardiger</value>
<value lang="2">créateur</value>
<value lang="3">Hersteller</value>
<value lang="4">الصانع</value>
<value lang="5">creatore</value>
<value lang="6">δημιουργός</value>
</name.type>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">PERSON</value>
<value lang="0">person</value>
<value lang="1">persoon</value>
<value lang="2">personne</value>
<value lang="3">Person</value>
<value lang="4">إسم شخص</value>
<value lang="5">persona</value>
<value lang="6">πρόσωπο</value>
</name.type>
<name.type>
<value lang="neutral">AUTHOR</value>
<value lang="0">author</value>
<value lang="1">auteur</value>
<value lang="2">auteur</value>
<value lang="3">Verfasser</value>
<value lang="4">المؤلف</value>
<value lang="5">autore</value>
<value lang="6">συντάκτης</value>
</name.type>
<birth.date.start>1931</birth.date.start>
<death.date.start>1987</death.date.start>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<surname>De Bruyne</surname>
<Address>
<address>Stationstraat 16</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Aalst</value>
</address.place>
<address.postal_code>9300</address.postal_code>
<address.type>woning Pieter De Bruyne</address.type>
</Address>
<biography>Pieter De Bruyne is als pionier binnen het postmodern ontwerpen een internationaal geapprecieerde meubelontwerper. Hij wijdde zijn hele leven aan de vernieuwing van het meubilair. De Bruynes werk sluit aan bij de Memphis-stijl, hoewel hij nooit actief deel wilde uitmaken van dergelijke bewegingen. Elk meubel van zijn hand opent nieuwe perspectieven en is stimulans om andere denkrichtingen in te slaan.

Bibliotheek Design museum Gent:
(1) Pieter De Bruyne 1931- 1987. Pionier van het postmoderne. / Christian Kieckens, Eva Storgaard
(2) 25 jaar Pieter De Bruyne. / Christian Norberg-Schulz</biography>
<Source>
<source>http://vocab.getty.edu/page/ulan/</source>
<source.number>500009402</source.number>
</Source>
<Source>
<source>https://www.wikidata.org/wiki/</source>
<source.number>Q14101030</source.number>
</Source>
<death.date.end>1987</death.date.end>
<death.place>
<value lang="">Aalst</value>
</death.place>
<nationality>
<value lang="">Belgisch</value>
</nationality>
<level_of_detail>
<value lang="neutral">FULL</value>
<value lang="0">full</value>
<value lang="1">volledig</value>
<value lang="2">complet</value>
<value lang="3">vollständig</value>
<value lang="5">completo</value>
</level_of_detail>
<forename>Pieter</forename>
<birth.date.end>1931</birth.date.end>
<birth.place>
<value lang="">Aalst</value>
</birth.place>
<gender>
<value lang="neutral">MALE</value>
<value lang="0">male</value>
<value lang="1">man</value>
<value lang="2">homme</value>
<value lang="3">männlich</value>
<value lang="5">maschio</value>
<value lang="6">αρσενικό</value>
</gender>
<occupation>
<value lang="">ontwerper</value>
</occupation>
<Part_of>
<part_of>
<value lang="">Pieter De Bruyne N.V.</value>
</part_of>
<part_of.notes />
<part_of.category />
<part_of.date.end />
<part_of.date.start />
</Part_of>
<Equivalent>
<equivalent_name>
<value lang="">Pieter De Bruyne N.V.</value>
</equivalent_name>
<equivalent_name.category />
</Equivalent>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<school_style>
<value lang="">post-modernisme</value>
</school_style>
<language>
<value lang="">Nederlands</value>
</language>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-07-19</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:55:26</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-07-19</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:55:24</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-07-17</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:24:24</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-06-18</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:54:47</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-06-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:44:02</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-05-28</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>08:20:09</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-05-27</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>10:44:41</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-05-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>14:24:58</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-05-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>14:23:25</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people&gt;people</edit.source>
<edit.date>2019-04-23</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>16:12:25</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>thesau&gt;thesau</edit.source>
<edit.date>2019-04-18</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>15:19:53</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT&gt;intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:58:19</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT&gt;intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:57:40</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT&gt;intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:50:49</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT&gt;intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:21:40</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT&gt;intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:20:30</edit.time>
</Edit>

最佳答案

通过切换到 XPath 作为定位任何给定节点的方法,您可以大大简化处理 XML 的代码部分。考虑一下:

import xml.etree.ElementTree as et

def node_text(node, default=''):
return node.text if node is not None and node.text is not None else default

tree = et.parse('20191125_DMG_PI.xml')

rows = []
for record in tree.iterfind('./record'):
rows.append({
'status': node_text(record.find('./name.status/value')),
'priref': record.get('priref'),
'full_name': node_text(record.find('./name/value')),
'achternaam': node_text(record.find('./surname')),
'geboorteplaats': node_text(record.find('./birth.place/value')),
'sterfplaats': node_text(record.find('./death.place/value')),
'detail': node_text(record.find('./level_of_detail/value[@lang="neutral"]')),
'adres': node_text(record.find('./Address/address')),
'zip': node_text(record.find('./Address/address.postal_code')),
'note': node_text(record.find('./name.note')),
'gender': node_text(record.find('./gender/value'))
})

print(rows)

顶部的 node_text() 辅助函数处理“未找到节点”的情况。如果您更喜欢使用 None 而不是空字符串,则可以使用 None 作为默认值,或者为每个值传递单独的默认值。

ElementTree 中的 XPath 必须以 ./ 开头,并且仅限于 XPath 1.0 功能的子集,但这对于您的用例来说已经足够了。

之后将放入数据帧中应该不再是问题。

关于python - 通过 XML 解析时记录被删除,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59055232/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com