python - lxml:Element addnext() 和 insert() 在处理 tail 时的区别-6ren

python - lxml:Element addnext() 和 insert() 在处理 tail 时的区别

转载作者：行者123 更新时间：2023-11-28 22:49:53

给定一个 lxml 元素 xml我遍历它的所有子项 c[0..n]调用c.getnext() .那是因为我需要在必要时即时插入 child ，而我不能使用迭代器这样做。所有元素都有 text和 tail设置。

让我来说明 addnext() 的不同行为和 insert()使用以下示例。假设一个简单的 XML 字符串，我将其解析为一个 lxml 树，然后，为了理智起见，检查它:

>>> import lxml.etree
>>> s = "<p>This is <b>bold</b> and this is italic text.</p>"
# Create a new lxml element.
>>> xml = lxml.etree.fromstring(s)
# Let's look at the element, its child, and all the texts and tails.
>>> lxml.etree.tostring(xml)
b'<p>This is <b>bold</b> and this is italic text.</p>'
>>> xml.text
'This is '
>>> xml.tail
>>> xml[0].text
'bold'
>>> xml[0].tail
' and this is italic text.'

到目前为止一切顺利，完全符合我的预期(有关 lxml 表示的更多信息，请参阅 here)。

现在我想将“斜体”一词包装到标签中，就像“粗体”包装到<b> 中一样标签。为此，我首先找到“斜体”子字符串开始的索引:

# Find the index of the "italic" substring.
>>> idx = xml[0].tail.find("italic")
>>> idx
13

然后我创建一个新的 lxml 元素:

# Create a new element and inspect it.
>>> new_c = lxml.etree.fromstring("<i>italic</i>")
>>> new_c.text
'italic'
>>> new_c.tail
>>>

为了正确地将这个新元素插入到 xml 树中，我必须拆分原始的 xml[0].tail字符串分成两个子字符串并从中删除“斜体”:

>>> new_c.tail = xml[0].tail[idx+len("italic"):]
>>> xml[0].tail = xml[0].tail[:idx]

现在一切都已准备就绪，可以将新元素插入到 xml 中。元素，这就是我现在困惑的地方。新 child 的插入new_c 在一个给定的xml[0]之后结果不同，Element API没有给我任何新信息:

# Adds the element as a following sibling directly after this element.
# Note that tail text is automatically discarded when adding at the root level.
>>> xml[0].addnext(new_c)
>>> lxml.etree.tostring(xml)
b'<p>This is <b>bold</b><i>italic</i> text. and this is </p>'

和

# Inserts a subelement at the given position in this element
>>> xml.insert(1 + xml.index(xml[0]), new_c)
>>> lxml.etree.tostring(xml)
b'<p>This is <b>bold</b> and this is <i>italic</i> text.</p>'

这两个调用似乎处理了tail不同(参见关于 addnext() 关于 tail 的评论)。即使考虑到评论，文本也不会从 <b> 中丢弃。但附加到 <i> ，根级别的处理方式也与更下方级别的处理方式不同(即，通过将 s 中的原始 XML 包装到附加的 <foo> 标记中可以观察到完全相同的行为)。

我在这里错过了什么？

编辑 lxml 邮件列表上的相关讨论是 here .

最佳答案

elem.addnext(nextelem) 在 XML 级别上进行操作，即在元素之后直接添加一些内容，将任何尾部文本移动到新插入的元素后面。这样做是为了使新元素成为紧随其后的兄弟元素。

parent.insert(where,elem) 就像父元素只是 etree.Element 的列表一样工作。它在列表中放置一个新元素而不对 etree.Element 实例进行任何更改。 parent.append(elem) 也将以这种方式或任何其他列表操作方式工作。

因此，这些函数在元素树上有两个不同的 View 。

>>> from lxml import etree as et
>>> 
>>> x = et.XML('<a>foo<b/>bar</a>')
>>> y = et.XML('<c>C!</c>')
>>> 
>>> et.dump(x)
<a>foo<b/>bar</a>
>>> x.find('b').addnext(y)
>>> et.dump(x)
<a>foo<b/><c>C!</c>bar</a>

尾部从 b 元素移动到 c 元素，以保持 XML 文档除了插入的元素之外相同。

现在，如果插入的元素已经有尾部，addnext 用于插入一个元素及其后面的文本。直接在 XML 元素之后，而不是在 etree Element-with-tail 之后。

>>> x = et.XML('<a>foo<b/>bar</a>')
>>> y = et.XML('<c>C!</c>')
>>> y.tail = 'more...'
>>> 
>>> x.find('b').addnext(y)
>>> et.dump(x)
<a>foo<b/><c>C!</c>more...bar</a>

关于python - lxml:Element addnext() 和 insert() 在处理 tail 时的区别，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23282241/

文章推荐： ios - 如何在 iPhone 中比较小数值和整数？

文章推荐： java - Eclipse Neon w/Tomcat 9(新 Servlet)HTTP 状态 404 – 未找到

文章推荐： iphone - DDMathParser 导致内存泄漏

python - lxml:Element addnext() 和 insert() 在处理 tail 时的区别
给定一个 lxml 元素 xml我遍历它的所有子项 c[0..n]调用c.getnext() .那是因为我需要在必要时即时插入 child ，而我不能使用迭代器这样做。所有元素都有 text和 tai

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - lxml:Element addnext() 和 insert() 在处理 tail 时的区别