gpt4 book ai didi

python - 'charmap' 编解码器无法解码位置 33222 : character maps to 中的字节 0x8d

转载 作者:行者123 更新时间:2023-12-02 16:49:06 25 4
gpt4 key购买 nike

我正在尝试通过 BeautifulSoup 使用 lxml 解析一个非常长的 html 文件。我知道 html 文件的字符编码是 UTF-8 with BOM但每当我尝试运行 contents = f.read()我收到以下错误:

'charmap' codec can't decode byte 0x8d in position 33222: character maps to <undefined>

这是我的代码的第一部分(也是有问题的):

from bs4 import BeautifulSoup

with open("doc.html", "r") as f:

contents = f.read()

soup = BeautifulSoup(contents, 'lxml')

print(soup.h2)
print(soup.head)
print(soup.li)

这是错误显示:

    UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-1-4805460879e0> in <module>
3 with open("doc.html", "r") as f:
4
----> 5 contents = f.read()
6
7 soup = BeautifulSoup(contents, 'lxml')

~\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 33222: character maps to <undefined>

最佳答案

with open("doc.html", "r", encoding="UTF-8") as f 应该可以解决您的问题。

关于python - 'charmap' 编解码器无法解码位置 33222 : character maps to <undefined> 中的字节 0x8d,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59444702/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com