gpt4 book ai didi

Evernote XML 上的 Python LXML 解析错误

转载 作者:太空宇宙 更新时间:2023-11-04 01:29:16 24 4
gpt4 key购买 nike

我正在尝试解析 Evernote Markup Language (ENML) 在 Python 2.7 中使用 lxml。 ENML 是 XHTML 的超集。

from StringIO import StringIO
import lxml.etree as etree

if __name__ == '__main__':
xml_str = StringIO('<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\r\n\r\n<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">\nA really simple example. &nbsp;Another sentence.\n</en-note>')
tree = etree.parse(xml_str)

上面的代码出错了:

XMLSyntaxError: Entity 'nbsp' not defined, line 5, column 32

如何成功解析 ENML?

最佳答案

  被 HTML 解析器理解,而不是 XML 解析器:

from StringIO import StringIO
import lxml.html as LH
if __name__ == '__main__':
xml_str = StringIO('<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\r\n\r\n<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">\nA really simple example. &nbsp;Another sentence.\n</en-note>')
tree = LH.parse(xml_str)
print(LH.tostring(tree))

关于Evernote XML 上的 Python LXML 解析错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15102954/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com