gpt4 book ai didi

python - 如何使用 lxml 迭代 GraphML 文件

转载 作者:行者123 更新时间:2023-12-04 05:49:51 24 4
gpt4 key购买 nike

我有以下 GraphML 文件“mygraph.gml”,我想用一个简单的 python 脚本解析它:

这表示一个简单的图,有 2 个节点“node0”、“node1”和它们之间的边

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="weight" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="name">node1</data>
</node>
<node id="n1">
<data key="name">node2</data>
</node>
<edge source="n1" target="n0">
<data key="weight">1</data>
</edge>
</graph>
</graphml>

这表示一个具有两个节点 n0 和 n1 的图,它们之间有一个权重为 1 的边。
我想用python解析这个结构。

我在 lxml 的帮助下写了一个脚本(我需要使用它,因为数据集比这个简单的例子大得多,超过 10^5 个节点,python minidom 太慢了)
import lxml.etree as et

tree = et.parse('mygraph.gml')

root = tree.getroot()

graphml = {
"graph": "{http://graphml.graphdrawing.org/xmlns}graph",
"node": "{http://graphml.graphdrawing.org/xmlns}node",
"edge": "{http://graphml.graphdrawing.org/xmlns}edge",
"data": "{http://graphml.graphdrawing.org/xmlns}data",
"label": "{http://graphml.graphdrawing.org/xmlns}data[@key='label']",
"x": "{http://graphml.graphdrawing.org/xmlns}data[@key='x']",
"y": "{http://graphml.graphdrawing.org/xmlns}data[@key='y']",
"size": "{http://graphml.graphdrawing.org/xmlns}data[@key='size']",
"r": "{http://graphml.graphdrawing.org/xmlns}data[@key='r']",
"g": "{http://graphml.graphdrawing.org/xmlns}data[@key='g']",
"b": "{http://graphml.graphdrawing.org/xmlns}data[@key='b']",
"weight": "{http://graphml.graphdrawing.org/xmlns}data[@key='weight']",
"edgeid": "{http://graphml.graphdrawing.org/xmlns}data[@key='edgeid']"
}

graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))

此脚本正确获取节点和边,以便我可以简单地遍历它们
for n in nodes:
print n.attrib

或类似的边缘:
for e in edges:
print (e.attrib['source'], e.attrib['target'])

但我无法真正理解如何获取边或节点的“数据”标签以打印边权重和节点标签“名称”。

这对我不起作用:
weights = graph.findall(graphml.get("weight"))

最后一个列表总是空的。为什么?我错过了一些东西,但不明白是什么。

最佳答案

您不能一次性完成,但是对于找到的每个节点,您可以使用数据的键/值构建一个字典:

graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))

for node in nodes + edges:
attribs = {}
for data in node.findall(graphml.get('data')):
attribs[data.get('key')] = data.text
print 'Node', node, 'have', attribs

它给出了结果:
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5a0> have {'name': 'node1'}
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5f0> have {'name': 'node2'}
Node <Element {http://graphml.graphdrawing.org/xmlns}edge at 0x7ff053d3e640> have {'weight': '1'}

关于python - 如何使用 lxml 迭代 GraphML 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10205811/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com