gpt4 book ai didi

Python 将 XML 同级数据放入字典中

转载 作者:行者123 更新时间:2023-11-30 22:31:29 24 4
gpt4 key购买 nike

我有一个如下所示的 xml:

<root>
<G>
<G1>1</G1>
<G2>some text</G2>
<G3>some text</G3>
<GP>
<GP1>1</GP1>
<GP2>a</GP2>
<GP3>a</GP3>
</GP>
<GP>
<GP1>2</GP1>
<GP2>b</GP2>
<GP3>b</GP3>
</GP>
<GP>
<GP1>3</GP1>
<GP2>c</GP2>
<GP3>c</GP3>
</GP>
</G>
<G>
<G1>2</G1>
<G2>some text</G2>
<G3>some text</G3>
<GP>
<GP1>1</GP1>
<GP2>aa</GP2>
<GP3>aa</GP3>
</GP>
<GP>
<GP1>2</GP1>
<GP2>bb</GP2>
<GP3>bb</GP3>
</GP>
<GP>
<GP1>3</GP1>
<GP2>cc</GP2>
<GP3>cc</GP3>
</GP>
</G>
<G>
<G1>3</G1>
<G2>some text</G2>
<G3>some text</G3>
<GP>
<GP1>1</GP1>
<GP2>aaa</GP2>
<GP3>aaa</GP3>
</GP>
<GP>
<GP1>2</GP1>
<GP2>bbb</GP2>
<GP3>bbb</GP3>
</GP>
<GP>
<GP1>3</GP1>
<GP2>ccc</GP2>
<GP3>ccc</GP3>
</GP>
</G>
</root>

我正在尝试将此 xml 转换为名为“G”的嵌套字典:

{ 1: {G1: 1,
G2: some text,
G3: some text,
GP: { 1: {GP1: 1,
GP2: a,
GP3: a},
2: {GP1: 2,
GP2: b,
GP3: b},
3: {GP1: 3,
GP2: c,
GP3: c}}
},
2: {G1: 2,
G2: some text,
G3: some text,
GP: { 1: {GP1: 1,
GP2: aa,
GP3: aa},
2: {GP1: 2,
GP2: bb,
GP3: bb},
3: {GP1: 3,
GP2: cc,
GP3: cc}}
},
3: {G1: 3,
G2: some text,
G3: some text,
GP: { 1: {GP1: 1,
GP2: a,
GP3: a},
2: {GP1: 2,
GP2: bbb,
GP3: bbb},
3: {GP1: 3,
GP2: ccc,
GP3: ccc}}
}
}

我的代码可以很好地获取直接位于“G”下的所有元素,因此 G1、G2 等,但对于 GP,我要么只获取一条记录,要么获取所有记录,但它会重复相同的事情有时我会在字典中的一个 GP 下获取所有 9 个 GP 元素。这是我的代码:

    f = 'path to file'
tree = ET.parse(f)
root = tree.getroot()
self.tree = tree
self.root = root
gs = len(self.tree.getiterator('G'))
g = {}
for i in range(0, gs):
d = {}
for elem in self.tree.getiterator('G')[i]:
if elem.text == "\n " and elem.tag not in ['GP']:
dd = {}
for parent in elem:
if parent.text == "\n ":
ddd = {}
for child in parent:
ddd[child.tag] = child.text
dd[parent.tag] = ddd
else:
dd[parent.tag] = parent.text
d[elem.tag] = dd
else:
d[elem.tag] = elem.text
g[i+1] = d

# Build GP
count = 0
gp = {}
for elem in self.tree.getiterator('GP'):
d = {}
for parent in elem:
if parent.text == "\n ":
dd = {}
for child in parent:
dd[child.tag] = child.text
d[parent.tag] = dd
else:
d[parent.tag] = parent.text
count += 1
gp[count] = d
g["GP"] = gp

最佳答案

code.py:

#!/usr/bin/env python3

import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp


FILE_NAME = "data.xml"


def convert_node(node, depth_level=0):
#print(" " * depth_level + node.tag)
child_nodes = list(node)
if not child_nodes:
return (node.text or "").strip()
ret_dict = dict()
child_node_tags = [item.tag for item in child_nodes]
child_index = 0
for child_node in child_nodes:
tag = child_node.tag
if child_node_tags.count(tag) > 1:
sub_obj_dict = ret_dict.get(tag, dict())
child_index += 1
sub_obj_dict[str(child_index)] = convert_node(child_node, depth_level=depth_level + 1)
ret_dict[tag] = sub_obj_dict
else:
ret_dict[tag] = convert_node(child_node, depth_level=depth_level + 1)
return ret_dict


def main():
tree = ET.parse(FILE_NAME)
root_node = tree.getroot()
converted_xml = convert_node(root_node)
print("\nResulting dict(s):\n")
for key in converted_xml: # converted_xml should be a dictionary having only one key (in our case "G" - we only care about its value, to match the required output)
pp(converted_xml[key])


if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()

注释:

  • FILE_NAME 包含包含输入 xml 的文件名。请随意更改它,以匹配您的
  • 转换发生在convert_node中。它是一个递归函数,在每个 xml 节点上调用并返回一个 Python 字典(或字符串)。算法:
    • 对于每个节点,获取其(直接)子节点的列表。如果节点没有任何(它是节点 - 如G#GP#节点),它将返回其文本
    • 如果该节点有多个具有特定标签的子节点,则其内容将添加到代表其索引的键下(例如 GGP 节点),子标签键对应的当前字典的子字典中
    • 所有具有唯一标签的子项都会将其内容放置在与当前词典直接下的标签相同的键下
    • 深度_级别没有使用(可以删除它),我用它以树形形式打印xml节点标签;它是 xml 树中的深度(root - 0、G - 1、G#GP - 2,GP# - 3,...)
  • 代码设计如下:
    • 常规:注意没有硬编码的键名称
    • 可扩展:如果在某个时刻xml将变得矿石复杂(例如,在GP节点下将会有一个GP假设 D 节点,该节点也将具有子节点 - 基本上 xml 将获得更多的深度级别),代码将处理它没有变化
    • Python 3Python 2 兼容

输出:

(py_064_03.05.04_test0) e:\Work\Dev\StackOverflow\q045799991>"e:\Work\Dev\VEnvs\py_064_03.05.04_test0\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

Resulting dict(s):

{'1': {'G1': '1',
'G2': 'some text',
'G3': 'some text',
'GP': {'1': {'GP1': '1', 'GP2': 'a', 'GP3': 'a'},
'2': {'GP1': '2', 'GP2': 'b', 'GP3': 'b'},
'3': {'GP1': '3', 'GP2': 'c', 'GP3': 'c'}}},
'2': {'G1': '2',
'G2': 'some text',
'G3': 'some text',
'GP': {'1': {'GP1': '1', 'GP2': 'aa', 'GP3': 'aa'},
'2': {'GP1': '2', 'GP2': 'bb', 'GP3': 'bb'},
'3': {'GP1': '3', 'GP2': 'cc', 'GP3': 'cc'}}},
'3': {'G1': '3',
'G2': 'some text',
'G3': 'some text',
'GP': {'1': {'GP1': '1', 'GP2': 'aaa', 'GP3': 'aaa'},
'2': {'GP1': '2', 'GP2': 'bbb', 'GP3': 'bbb'},
'3': {'GP1': '3', 'GP2': 'ccc', 'GP3': 'ccc'}}}}

关于Python 将 XML 同级数据放入字典中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45799991/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com