gpt4 book ai didi

python - Unicode解码错误: 'utf-8' codec can't decode byte 0xe3 in position 34: invalid continuation byte

转载 作者:行者123 更新时间:2023-12-01 01:48:10 28 4
gpt4 key购买 nike

我想使用以下代码在 python 文件中打开一些波斯语文本文件:

 for line in codecs.open('0001.txt',encoding='UTF-8'):
lines.appends(line)

但它给了我这个错误:

> Traceback (most recent call last):
File "/usr/lib/pycharm-community/helpers/pydev/pydevd.py", line 1596, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/usr/lib/pycharm-community/helpers/pydev/pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/usr/lib/pycharm-community/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/nlpuser/Documents/ms/Work/General_Dataset_creator/BijanKhanReader.py", line 24, in <module>
for lin in codecs.open('corpuses/markaz/0001.txt',encoding='UTF-8'):
File "/home/nlpuser/anaconda3/envs/tmpy36/lib/python3.6/codecs.py", line 713, in __next__
return next(self.reader)
File "/home/nlpuser/anaconda3/envs/tmpy36/lib/python3.6/codecs.py", line 644, in __next__
line = self.readline()
File "/home/nlpuser/anaconda3/envs/tmpy36/lib/python3.6/codecs.py", line 557, in readline
data = self.read(readsize, firstline=True)
File "/home/nlpuser/anaconda3/envs/tmpy36/lib/python3.6/codecs.py", line 503, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 0: invalid continuation byte

这段代码有什么问题?

他是文件的输出:

0001.txt: Non-ISO extended-ASCII text, with CRLF line terminators

最佳答案

UTF-8 具有非常特定的格式,因为字符可以由 1 到 4 个字节的任意位置表示。

如果一个字符是单字节,则用 0x00-0x7F 表示。如果它由两个或更多表示,则前导字节将以 0xC2 to 0xF4 开头。 ,后跟一到三个连续字节,范围为 0x80 to 0xBF .

在您的情况下,Python 找到了一个位于连续字符位置的字符(即前导字符后面的字符之一),但为 0xE3 ,这不是合法的延续字符。问题可能出在您的文本文件中,而不是您的程序中 - 要么编码错误,要么编码错误。

使用hexdump -C <file>xxd <file>验证您拥有的确切字节序列和 file <file>尝试猜测编码,我们也许可以说更多。

关于python - Unicode解码错误: 'utf-8' codec can't decode byte 0xe3 in position 34: invalid continuation byte,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51021533/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com