gpt4 book ai didi

python - UnicodeDecodeError :'gbk' 编解码器无法解码位置 0 非法多字节序列中的字节 0x80

转载 作者:太空宇宙 更新时间:2023-11-03 12:32:39 36 4
gpt4 key购买 nike

我用的是python 3.4,win 7 64位系统。我运行了以下代码:

      6   """ load single batch of cifar """
7 with open(filename, 'r') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data']

错误信息是UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence

我将第 7 行更改为:

      6   """ load single batch of cifar """
7 with open(filename, 'r',encoding='utf-8') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data']

错误信息变成了UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

消息最终指向decode(self, input, final)中的Python34\lib\codecs.py。

    311         # decode input (taking the buffer into account)
312 data = self.buffer + input
--> 313 (result, consumed) = self._buffer_decode(data, self.errors, final)
314 # keep undecoded input until the next call
315 self.buffer = data[consumed:]

我进一步修改了代码:

      6 """ load single batch of cifar """ 
7 with open(filename, 'rb') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data'] 10 Y = datadict['labels']

嗯,这次是 UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)

问题是什么,如何解决?

最佳答案

Pickle 文件是二进制数据文件,因此加载时始终必须使用 'rb' 模式打开文件。不要在这里尝试使用文本模式。

您正在尝试加载包含字符串数据的 Python 2 pickle。您必须告诉 pickle.load() 如何将该数据转换为 Python 3 字符串,或者将它们保留为字节。

默认尝试将这些字符串解码为 ASCII,但解码失败。查看pickle.load() documentation :

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

将编码设置为latin1 允许您直接导入数据:

with open(filename, 'rb') as f:
datadict = pickle.load(f, encoding='latin1')

似乎是 numpy 数组数据导致了这里的问题,因为集合中的所有字符串使用 ASCII 字符。

替代方法是使用 encoding='bytes' 但是所有文件名和顶级字典键都是 bytes 对象,你必须解码这些或者用 b 作为所有键文字的前缀。

关于python - UnicodeDecodeError :'gbk' 编解码器无法解码位置 0 非法多字节序列中的字节 0x80,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28165639/

36 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com