gpt4 book ai didi

python - 恢复模式下的 etree.XMLParser 是否仍会抛出解析错误?

转载 作者:太空宇宙 更新时间:2023-11-04 11:19:50 36 4
gpt4 key购买 nike

我有一个实用程序方法,它使用创建为 etree.XMLParser(recover=True) 的解析器来解析 XML。我想在单元测试中测试失败场景。除了空输入抛出 lxml.etree.XMLSyntaxError 之外,我似乎无法破坏解析器。

我的问题是:是否可以为该解析器构造一个 StringIOBytesIO 输入,以便解析器抛出解析错误?

这是一些示例(使用 Python 3.5 和 lxml 4.3.3 测试):

from io import BytesIO
from lxml import etree


def parse(xml):
parser = etree.XMLParser(recover=True)
elem = etree.parse(BytesIO(xml), parser)
print(etree.tostring(elem))


parse(b'<broken<') # prints b'<broken/>'
parse(b'</lf|\jf>') # prints None
parse('<?xml encoding="ascii"?><foo>æøå</foo>'.encode('utf-8')) # prints b'<foo/>'
parse(b'') # Throws lxml.etree.XMLSyntaxError

最佳答案

如果我在您显示的任何不引发错误的错误输入的开头添加了 NULL 字符,我确实会收到错误。例如:

parse(b'\0<broken<')

产生:

Traceback (most recent call last):
File "test.py", line 13, in <module>
parse(b'\0<broken<') # prints b'<broken/>'
File "test.py", line 9, in parse
elem = etree.parse(BytesIO(xml), parser)
File "src/lxml/etree.pyx", line 3435, in lxml.etree.parse
File "src/lxml/parser.pxi", line 1857, in lxml.etree._parseDocument
File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1765, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
File "<string>", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

关于python - 恢复模式下的 etree.XMLParser 是否仍会抛出解析错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56250685/

36 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com