gpt4 book ai didi

python - lxml 无法解析 xml(编码是否为 utf-8)[python]

转载 作者:数据小太阳 更新时间:2023-10-29 01:56:27 25 4
gpt4 key购买 nike

我的代码:

import re
import requests
from lxml import etree

url = 'http://weixin.sogou.com/gzhjs?openid=oIWsFt__d2wSBKMfQtkFfeVq_u8I&ext=2JjmXOu9jMsFW8Sh4E_XmC0DOkcPpGX18Zm8qPG7F0L5ffrupfFtkDqSOm47Bv9U'

r = requests.get(url)

items = r.json()['items']
  1. 没有编码('utf-8'):

etree.fromstring(items[0]) 输出:

ValueError                                
Traceback (most recent call last)
<ipython-input-69-cb8697498318> in <module>()
----> 1 etree.fromstring(items[0])

lxml.etree.pyx in lxml.etree.fromstring (src\lxml\lxml.etree.c:68121)()

parser.pxi in lxml.etree._parseMemoryDocument (src\lxml\lxml.etree.c:102435)()

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
  1. 使用编码('utf-8'):

etree.fromstring(items[0].encode('utf-8')) 输出:

  File "<string>", line unknown
XMLSyntaxError: CData section not finished
鎶楀啺鎶㈤櫓鎹锋姤:闃冲寳I绾挎, line 1, column 281

不知道要解析这个 xml..

最佳答案

作为解决方法,您可以在将字符串传递给 etree.fromstring 之前删除 encoding 属性:

xml = re.sub(r'\bencoding="[-\w]+"', '', items[0], count=1)
root = etree.fromstring(xml)

更新在看到@Lea 对问题的评论后:

使用显式编码指定解析器:

xml = r.json()['items'].encode('utf-8')
root = etree.fromstring(xml, parser=etree.XMLParser(encoding='utf-8'))

关于python - lxml 无法解析 xml(编码是否为 utf-8)[python],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34084760/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com