>> import lxml.etree >>> e =-6ren">
gpt4 book ai didi

python - "lxml.etree.XPathEvalError: Invalid expression"带有 Unicode 元素名称

转载 作者:行者123 更新时间:2023-12-02 07:22:31 25 4
gpt4 key购买 nike

lxml 很好地支持 Unicode 元素名称,因为它们根据 XML 规范是有效的。但是在 XPath 中使用 Unicode 会产生错误:

>>> import lxml.etree
>>> e = lxml.etree.fromstring('<?xml version="1.0" encoding="UTF-8"?><элемент>текст</элемент>'.encode('utf-8'))
>>> e.xpath('/элемент/text()')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 1509, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:50702)
File "xpath.pxi", line 318, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:145954)
File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:144962)
File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:144817)
lxml.etree.XPathEvalError: Invalid expression

这是 lxml 限制吗?我在文档中找不到它,但也许我错过了。

有人可以解释一下这背后的原因吗?

<小时/> 更新: 仅当 XPath 的第二个字符是西里尔字母时,问题才会重现。它适用于:

  • 相对路径,例如//элемент

  • 第一个英文字母的路径,例如 //qлемент

  • /./элемент 而不是 /элемент (它们是等效的)

而且,这似乎是 libxml2 问题,而不仅仅是 lxml 问题。

$ xmlstarlet sel -t -v "/элемент/text()" test.xml 
Invalid expression: /элемент/text()
compilation error: element with-param
XSLT-with-param: Failed to compile select expression '/элемент/text()'
$ xmlstarlet sel -t -v "/./элемент/text()" test.xml
текст

我放弃了这个问题,转而使用 /./ 来获取带有西里尔字母标记的绝对 XPath。

最佳答案

如果引用根节点,您的 XPath 缺少 /:

>>> e.xpath('//элемент/text()')
['текст']

或者两个点..(如果引用相对父节点):

>>> e.xpath('../элемент/text()')
['текст']

关于python - "lxml.etree.XPathEvalError: Invalid expression"带有 Unicode 元素名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29689078/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com