gpt4 book ai didi

python - 创建时与命名空间的Python多处理池交互

转载 作者:行者123 更新时间:2023-11-28 18:32:12 25 4
gpt4 key购买 nike

我们知道multiprocessing.Pool必须在函数定义之后初始化才能在其上运行。但是我发现下面的代码对我来说是难以理解的

import os
from multiprocessing import Pool

def func(i): print('first')

pool1 = Pool(2)
pool1.map(func, range(2)) #map-1

def func(i): print('second')
func2 = func

print('------')
pool1.map(func, range(2)) #map-2
pool1.map(func2, range(2)) #map-3

pool2 = Pool(2)
print('------')
pool2.map(func, range(2)) #map-4
pool2.map(func2, range(2)) #map-5

输出(linux上的python2.7和python3.4)是
first         #map-1
first
------
first #map-2
first
first #map-3
first
------
second #map-4
second
second #map-5
second

map-2按预期打印。
但是如何找到这个名字呢?我的意思是 'first'是在 map-3第一次出现之前初始化的。因此 func2确实被执行,而 pool1则不是。为什么?
如果我直接定义func2
def func2(i): print('second')

那么 func2就找不到许多帖子中提到的名字 func2 = func,比如 this one。两个案子有什么区别?
据我所知,这些参数是通过酸洗传递给从属进程的,但是
如何将调用的函数传递给其他进程?或者子进程如何找到被调用的函数?

最佳答案

tl;dr:map-3中调用第一个func的问题,当您希望第二个func时,是因为Pool.map()使用pickle序列化func.__name__,pickle解析为func,即使它被分配给func2引用,并被发送到子进程,子进程在本地向子进程查找func
好的,我可以数一数下面列出的四个不同的问题,我认为您已经被告知了名称空间和分叉过程,直接进入您的问题的乐趣☺
①但是map-3如何找到func2这个名字呢?
②所以func2=func确实执行,而def func(i):print('second')不执行。为什么?
③那么map-3就找不到很多帖子提到的func2这个名字,比如这个。两个案子有什么区别?
④据我所知,参数通过pickling传递给从进程,但是pool如何将被调用的函数传递给其他进程?或者子进程如何找到被调用的函数?
所以我添加了更多的代码来展示更多的内部特性:

import os
from multiprocessing import Pool

print(os.getpid(), 'parent')

def func(i):
print(os.getpid(), 'first', end=" | ")
if 'func' in globals():
print(globals()['func'], end=" | ")
else:
print("no func in globals", end=" | ")
if 'func2' in globals():
print(globals()['func2'])
else:
print("no func2 in globals")

print('------ map-1')
pool1 = Pool(2)
pool1.map(func, range(2)) #map-1

def func(i):
print(os.getpid(), 'second', end=" | ")
if 'func' in globals():
print(globals()['func'], end=" | ")
else:
print("no func in globals", end=" | ")
if 'func2' in globals():
print(globals()['func2'])
else:
print("no func2 in globals")
func2 = func

print('------ map-2')
pool1.map(func, range(2)) #map-2
print('------ map-3')
pool1.map(func2, range(2)) #map-3

pool2 = Pool(2)
print('------ map-4')
pool2.map(func, range(2)) #map-4
print('------ map-5')
pool2.map(func2, range(2)) #map-5

我的系统有哪些输出:
21512 parent
------ map-1
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-2
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-3
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-4
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
------ map-5
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>

因此,我们可以看到对于 pool1从来没有一个 func2被添加到名称空间中。所以肯定有一些可疑的事情发生了,对于我来说,现在彻底研究 multiprocessing的源代码和调试器来理解发生了什么已经太迟了。
因此,如果我必须猜到一个答案,则 pickle模块发现不知何故, func2解决了 ,它已经存在于标签 0x7f62d531bed8,因此它把已知的“标签” >在儿童侧,将其解析为 func。即。:
func2 → 0x7f62d531bed8 → func → [PICKLE] → globals()['func'] → 0x7f62d67f7cf8

为了测试我的理论,我把你的代码改了一点,把第二个 func重命名为 0x7f62d67f7cf8,下面是我得到的:
------ map-3
Process PoolWorker-1:
Process PoolWorker-2:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
return recv()
AttributeError: 'module' object has no attribute 'func2'
AttributeError: 'module' object has no attribute 'func2'

然后将 func()更改为 func2()
------ map-2
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process PoolWorker-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
return recv()
AttributeError: 'module' object has no attribute 'func2'
AttributeError: 'module' object has no attribute 'func2'

所以我想我已经开始说重点了。此外,它还显示了在子进程方面,从何处读取代码以了解正在发生的事情。
所以更多的线索来回答②和③!
为了进一步说明,我在 func = func2第114行中添加了一个print语句:
    job, i, func, args, kwds = task
print("XXX", os.getpid(), job, i, func, args, kwds)

以显示发生了什么。我们可以看到 func2 = func被解析为 pool.py,它与父函数中的地址相同:
23432 parent
------ map-1
('XXX', 23433, 0, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 0, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-2
('XXX', 23433, 1, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 1, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-3
('XXX', 23433, 2, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {})
23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
('XXX', 23434, 2, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {})
23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals
------ map-4
('XXX', 23438, 3, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {})
23438 second | <function func at 0x1092e60> | <function func at 0x1092e60>
('XXX', 23439, 3, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {})
23439 second | <function func at 0x1092e60> | <function func at 0x1092e60>
------ map-5
('XXX', 23438, 4, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {})
('XXX', 23439, 4, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {})
23438 second | <function func at 0x1092e60> | <function func at 0x1092e60>
23439 second | <function func at 0x1092e60> | <function func at 0x1092e60>

所以要回答④,我们需要进一步挖掘多处理源,甚至可能是pickle源。
但我想我对决议的感觉可能是对的…
然后剩下的唯一问题是,在将标签推送到子进程之前,为什么它要将标签解析为地址并再次返回到标签!
编辑:我想我知道为什么!当我要睡觉的时候,原因突然出现在我的脑海里,所以我回到了我的键盘上:
在选取函数时,pickles接受包含该函数的参数,并从函数的对象本身获取其名称:
因此,即使您确实创建了一个新的函数对象,但在内存中确实有一个不同的地址:
>>> print(func)
<function func at 0x7fc6174e3ed8>

pickles不在乎,因为如果这个函数还不能被孩子访问,它就永远不能被访问。所以pickle只解决 func
>>> print("func.__name__:", func.__name__)
func.__name__: func
>>> print("func2.__name__:", func2.__name__)
func2.__name__: func

然后,即使您在父线程上更改了函数的主体,并对该函数进行了新的引用,但真正被pickle的是函数的内部名称,该名称是在lambda被赋值或函数被定义时给定的。
这就解释了为什么在 0x7f2d0238fcf8阶段将 func.__name__func时会得到旧的 func2函数。
因此,作为一个结论,对于① pool1没有找到名称 map-3,它在 map-3所指的函数中找到了名称 func2。所以,这也回答了②和③,因为找到的 func正在执行原来的 func2函数。其机制是,使用 func来pickle和解析两个进程之间的函数名,回答④。
上次更新,来自您:
func中,它使用
if name is None: name = getattr(obj, '__qualname__', None)

然后再来一次
if name is None: name = obj.__name__. 

因此,如果obj没有 func.__name__,则将使用 pickle._Pickler.save_global
但是,它将检查传递的对象是否与子流程中的对象相同:
if obj2 is not obj: raise PicklingError(...) 

其中 __qualname__
是的,但是请记住,传递的对象只是函数的(内部)名称,而不是函数本身。子进程无法确定其 __name__是否与父进程的 obj2, parent = _getattribute(module, name)在内存中相同。
编辑自@SyrtisMajor:
好,让我们更改上面的第一个代码:
import os
from multiprocessing import Pool

print(os.getpid(), 'parent')

def func(i):
print(os.getpid(), 'first', end=" | ")
if 'func' in globals():
print(globals()['func'], end=" | ")
else:
print("no func in globals", end=" | ")
if 'func2' in globals():
print(globals()['func2'])
else:
print("no func2 in globals")

print('------ map-1')
pool1 = Pool(2)
pool1.map(func, range(2)) #map-1

def func2(i):
print(os.getpid(), 'second', end=" | ")
if 'func' in globals():
print(globals()['func'], end=" | ")
else:
print("no func in globals", end=" | ")
if 'func2' in globals():
print(globals()['func2'])
else:
print("no func2 in globals")

func2.__qualname__ = func.__qualname__

func = func2

print('------ map-2')
pool1.map(func, range(2)) #map-2
print('------ map-3')
pool1.map(func2, range(2)) #map-3

pool2 = Pool(2)
print('------ map-4')
pool2.map(func, range(2)) #map-4
print('------ map-5')
pool2.map(func2, range(2)) #map-5

输出如下:
38130 parent
------ map-1
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-2
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-3
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-4
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510>
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510>
------ map-5
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510>
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510>

它和我们的第一个输出完全一样。请注意, func()定义之后的 func()是关键,因为pickle将检查 func = func2(名为 func2)是否与 func2相同。如果没有,那么酸洗将失败。

关于python - 创建时与命名空间的Python多处理池交互,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35886272/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com