gpt4 book ai didi

python - 为什么 "pickle"和 "multiprocessing picklability"在 Python 中如此不同?

转载 作者:太空宇宙 更新时间:2023-11-04 04:07:53 33 4
gpt4 key购买 nike

在 Windows 上使用 Python 的 multiprocessing 需要许多参数在传递给子进程时是“可挑选的”。

import multiprocessing

class Foobar:

def __getstate__(self):
print("I'm being pickled!")

def worker(foobar):
print(foobar)

if __name__ == "__main__":
# Uncomment this on Linux
# multiprocessing.set_start_method("spawn")

foobar = Foobar()
process = multiprocessing.Process(target=worker, args=(foobar, ))
process.start()
process.join()

文档 mentions this explicitly几次:

Picklability

Ensure that the arguments to the methods of proxies are picklable.

[...]

Better to inherit than pickle/unpickle

When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.

[...]

More picklability

Ensure that all arguments to Process.__init__() are picklable. Also, if you subclass Process then make sure that instances will be picklable when the Process.start method is called.

但是,我注意到“multiprocessing pickle”和标准 pickle 模块之间有两个主要区别,我很难理解所有这些。


multiprocessing.Queue() 不是“可挑选的”但可传递给子进程

import pickle
from multiprocessing import Queue, Process

def worker(queue):
pass

if __name__ == "__main__":
queue = Queue()

# RuntimeError: Queue objects should only be shared between processes through inheritance
pickle.dumps(queue)

# Works fine
process = Process(target=worker, args=(queue, ))
process.start()
process.join()

如果在“ma​​in”中定义则不可 pickle

import pickle
from multiprocessing import Process

def worker(foo):
pass

if __name__ == "__main__":
class Foo:
pass

foo = Foo()

# Works fine
pickle.dumps(foo)

# AttributeError: Can't get attribute 'Foo' on <module '__mp_main__' from 'C:\\Users\\Delgan\\test.py'>
process = Process(target=worker, args=(foo, ))
process.start()
process.join()

如果 multiprocessing 内部不使用 pickle,那么这两种序列化对象的方式有什么内在区别?

此外,“继承”在多处理上下文中是什么意思?我怎么会更喜欢它而不是 pickle 呢?

最佳答案

multiprocessing.Queue 被传递给子进程时,实际发送的是从pipe 获得的文件描述符(或句柄)。 ,它必须在创建子项之前由父项创建。 pickle 的错误是为了防止尝试通过另一个 Queue(或类似 channel )发送 Queue,因为那时使用它为时已晚. (Unix 系统实际上支持通过某些类型的套接字发送管道,但 multiprocessing 不使用此类功能。)预计某些 multiprocessing 类型是“显而易见的”可以发送到否则无用的子进程,因此没有提及明显的矛盾。

由于“spawn”启动方法无法使用任何已创建的 Python 对象创建新进程,它必须重新导入主脚本以获得相关函数/类定义。由于显而易见的原因,它没有像原始运行那样设置 __name__,因此依赖于该设置的任何内容都将不可用。 (在这里,失败的是 unpickling,这就是您的手动 pickling 起作用的原因。)

fork 方法在父对象(仅在 fork 时)仍然存在的情况下启动子对象;这就是继承的含义。

关于python - 为什么 "pickle"和 "multiprocessing picklability"在 Python 中如此不同?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56912846/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com