gpt4 book ai didi

python - 使用 lxml 解析 RSS 时出现编码错误

转载 作者:太空狗 更新时间:2023-10-29 19:30:19 26 4
gpt4 key购买 nike

我想用 lxml 解析下载的 RSS,但我不知道如何处理 UnicodeDecodeError?

request = urllib2.Request('http://wiadomosci.onet.pl/kraj/rss.xml')
response = urllib2.urlopen(request)
response = response.read()
encd = chardet.detect(response)['encoding']
parser = etree.XMLParser(ns_clean=True,recover=True,encoding=encd)
tree = etree.parse(response, parser)

但是我得到一个错误:

tree   = etree.parse(response, parser)
File "lxml.etree.pyx", line 2692, in lxml.etree.parse (src/lxml/lxml.etree.c:49594)
File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71364)
File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:71647)
File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:70742)
File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:67
740)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etr
ee.c:63824)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64745)
File "parser.pxi", line 559, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64027)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 97: ordinal not in range(128)

最佳答案

我遇到了类似的问题,事实证明这与编码无关。发生的事情是这样的——lxml 向您抛出一个完全不相关的错误。在这种情况下,错误是 .parse 函数需要一个文件名或 URL,而不是一个包含内容本身的字符串。但是,当它试图打印出错误时,它会在非 ascii 字符上阻塞并显示完全困惑的错误消息。非常不幸,其他人在这里评论了这个问题:

https://mailman-mail5.webfaction.com/pipermail/lxml/2009-February/004393.html

幸运的是,您的修复非常简单。只需将 .parse 替换为 .fromstring 就可以了:

request = urllib2.Request('http://wiadomosci.onet.pl/kraj/rss.xml')
response = urllib2.urlopen(request)
response = response.read()
encd = chardet.detect(response)['encoding']
parser = etree.XMLParser(ns_clean=True,recover=True,encoding=encd)

## lxml Y U NO MAKE SENSE!!!
tree = etree.fromstring(response, parser)

刚刚在我的机器上测试过,它运行良好。希望对您有所帮助!

关于python - 使用 lxml 解析 RSS 时出现编码错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5812009/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com