gpt4 book ai didi

python - 移植pickle py2到py3 字符串变成字节

转载 作者:太空狗 更新时间:2023-10-29 18:30:14 29 4
gpt4 key购买 nike

我有一个用 python 2.7 创建的 pickle 文件,我正试图将其移植到 python 3.6。该文件通过 pickle.dumps(self.saved_objects, -1)

保存在 py 2.7 中

并通过 loads(data, encoding="bytes") 在 python 3.6 中加载(从以 rb 模式打开的文件)。如果我尝试以 r 模式打开并将 encoding=latin1 传递给 loads,我会收到 UnicodeDecode 错误。当我将它作为字节流打开时,它会加载,但实际上每个字符串现在都是字节字符串。每个对象的 __dict__ 键都是 b"a_variable_name",然后在调用 an_object.a_variable_name 时生成属性错误,因为 __getattr__传递一个字符串,__dict__ 只包含字节。我觉得我已经尝试过参数和 pickle 协议(protocol)的每一种组合。除了强行将所有对象的 __dict__ 键转换为字符串外,我一头雾水。有什么想法吗?

** 跳至 2017 年 4 月 28 日更新以获得更好的示例

-------------------------------------------- ---------------------------------------------- --------------

** 2017 年 4 月 27 日更新

这个最小的例子说明了我的问题:

来自 py 2.7.13

import pickle

class test(object):
def __init__(self):
self.x = u"test ¢" # including a unicode str breaks things

t = test()
dumpstr = pickle.dumps(t)

>>> dumpstr
"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."

来自 py 3.6.1

import pickle

class test(object):
def __init__(self):
self.x = "xyz"

dumpstr = b"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."

t = pickle.loads(dumpstr, encoding="bytes")

>>> t
<__main__.test object at 0x040E3DF0>
>>> t.x
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
t.x
AttributeError: 'test' object has no attribute 'x'
>>> t.__dict__
{b'x': 'test ¢'}
>>>

-------------------------------------------- ---------------------------------------------- --------------

2017 年 4 月 28 日更新

为了重现我的问题,我发布了我实际的原始 pickle 数据 here

pickle 文件是在 python 2.7.13 中创建的,windows 10 使用

with open("raw_data.pkl", "wb") as fileobj:
pickle.dump(library, fileobj, protocol=0)

(协议(protocol) 0 所以它是人类可读的)

要运行它你需要classes.py

# classes.py

class Library(object): pass


class Book(object): pass


class Student(object): pass


class RentalDetails(object): pass

这里是测试脚本:

# load_pickle.py
import pickle, sys, itertools, os

raw_pkl = "raw_data.pkl"
is_py3 = sys.version_info.major == 3

read_modes = ["rb"]
encodings = ["bytes", "utf-8", "latin-1"]
fix_imports_choices = [True, False]
files = ["raw_data_%s.pkl" % x for x in range(3)]


def py2_test():
with open(raw_pkl, "rb") as fileobj:
loaded_object = pickle.load(fileobj)
print("library dict: %s" % (loaded_object.__dict__.keys()))
return loaded_object


def py2_dumps():
library = py2_test()
for protcol, path in enumerate(files):
print("dumping library to %s, protocol=%s" % (path, protcol))
with open(path, "wb") as writeobj:
pickle.dump(library, writeobj, protocol=protcol)


def py3_test():
# this test iterates over the different options trying to load
# the data pickled with py2 into a py3 environment
print("starting py3 test")
for (read_mode, encoding, fix_import, path) in itertools.product(read_modes, encodings, fix_imports_choices, files):
py3_load(path, read_mode=read_mode, fix_imports=fix_import, encoding=encoding)


def py3_load(path, read_mode, fix_imports, encoding):
from traceback import print_exc
print("-" * 50)
print("path=%s, read_mode = %s fix_imports = %s, encoding = %s" % (path, read_mode, fix_imports, encoding))
if not os.path.exists(path):
print("start this file with py2 first")
return
try:
with open(path, read_mode) as fileobj:
loaded_object = pickle.load(fileobj, fix_imports=fix_imports, encoding=encoding)
# print the object's __dict__
print("library dict: %s" % (loaded_object.__dict__.keys()))
# consider the test a failure if any member attributes are saved as bytes
test_passed = not any((isinstance(k, bytes) for k in loaded_object.__dict__.keys()))
print("Test %s" % ("Passed!" if test_passed else "Failed"))
except Exception:
print_exc()
print("Test Failed")
input("Press Enter to continue...")
print("-" * 50)


if is_py3:
py3_test()
else:
# py2_test()
py2_dumps()

将所有 3 个放在同一目录中,然后首先运行 c:\python27\python load_pickle.py,这将为 3 个协议(protocol)中的每一个创建 1 个 pickle 文件。然后使用 python 3 运行相同的命令并注意它版本将 __dict__ 键转换为字节。我让它工作了大约 6 个小时,但我一直想不通我是怎么把它弄坏的。

最佳答案

简而言之,您正在点击 bug 22005RentalDetails 对象中使用 datetime.date 对象。

这可以通过 encoding='bytes' 参数来解决,但这会使您的类中包含字节的 __dict__:

>>> library = pickle.loads(pickle_data, encoding='bytes')
>>> dir(library)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'bytes'

可以根据您的特定数据手动修复:

def fix_object(obj):
"""Decode obj.__dict__ containing bytes keys"""
obj.__dict__ = dict((k.decode("ascii"), v) for k, v in obj.__dict__.items())


def fix_library(library):
"""Walk all library objects and decode __dict__ keys"""
fix_object(library)
for student in library.students:
fix_object(student)
for book in library.books:
fix_object(book)
for rental in book.rentals:
fix_object(rental)

但这太脆弱了,你应该寻找更好的选择。

1) 实现 __getstate__/__setstate__将日期时间对象映射到一个完整的表示,例如:

class Event(object):
"""Example class working around datetime pickling bug"""

def __init__(self):
self.date = datetime.date.today()

def __getstate__(self):
state = self.__dict__.copy()
state["date"] = state["date"].toordinal()
return state

def __setstate__(self, state):
self.__dict__.update(state)
self.date = datetime.date.fromordinal(self.date)

2) 根本不要使用 pickle。沿着 __getstate__/__setstate__ 的路线,你可以在你的类中实现 to_dict/from_dict 方法或类似的方法将其内容保存为 json 或其他一些纯格式。

最后一点,不需要在每个对象中都对库进行反向引用。

关于python - 移植pickle py2到py3 字符串变成字节,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43648081/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com