gpt4 book ai didi

python - lxml etree 在之前找到最接近的元素

转载 作者:行者123 更新时间:2023-12-03 15:51:27 30 4
gpt4 key购买 nike

xml文件结构如下

<a>
<b>
<d>
</b>

<c attr1="important"/>
<b>
<d>
</b>
<c attr1="so important" />
<b></b>
</a>

我的解析器首先获取所有 <d>元素

from lxml import etree
xmltree = etree.parse(document)
elems = xmltree.xpath('//d')

现在的任务是:

从最近的<c>获取属性在当前 <d> 之前标记 标签,如果有的话。

天真的方法是做类似下面的事情

for el in elems:
it = el.getparent()
while it != None and it.tag != 'c':
prev = it.getprevious()
if prev == None:
it = it.getparent()
else:
it = prev

if it != None:
print el, it.get("attr1")

但对我来说这看起来并不简单——我是不是遗漏了文档中的某些内容?如果不实现我自己的迭代器,我该如何解决这个问题?

最佳答案

使用 preceding axis :

The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.

for el in elems:
try:
print el.xpath("preceding::c[@attr1]")[-1].get("attr1")
except IndexError:
print "No preceding 'c' element."

演示:

>>> from lxml import etree
>>>
>>> data = """
... <a>
... <b>
... <d/>
... </b>
...
... <c attr1="important"/>
... <b>
... <d/>
... </b>
... <c attr1="so important" />
... <b></b>
... </a>
... """
>>> xmltree = etree.fromstring(data)
>>> elems = xmltree.xpath('//d')
>>>
>>> for el in elems:
... try:
... print el.xpath("preceding::c[@attr1]")[-1].get("attr1")
... except IndexError:
... print "No preceding 'c' element."
...
No preceding 'c' element.
important

关于python - lxml etree 在之前找到最接近的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31009455/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com