gpt4 book ai didi

python - 如何将 joblib 并行化与不返回任何内容的类内方法一起使用

转载 作者:行者123 更新时间:2023-12-04 08:27:10 29 4
gpt4 key购买 nike

我目前正在尝试实现 parallel for循环使用 joblib在 python 中 3.8.3 .
在 for 循环中,我想将一个类方法应用于一个类的实例,同时在另一个类中应用一个方法。
这是一个 MWE,我尝试看看我的想法是否有效,但它没有。有谁知道如何让它发挥作用?

from joblib import Parallel, delayed

class A():
def __init__(self):
self.val = 0
def add5(self):
self.val += 5

class B():
def __init__(self):
self.obj = [A() for _ in range(10)]
def apply(self):
""" this is where I'm trying to use joblib:
for a in self.obj:
a.add5()"""

def f(x):
x.add5()
Parallel(n_jobs=-1)(delayed(f)(x) for x in self.obj)
def prnt(self):
print([a.val for a in self.obj])

b = B()
b.prnt() # returns [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
b.apply()
b.prnt() # returns [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] but
# I expect [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]


我的问题的更多背景:我正在使用 sci-kit 学习实现一个提升算法,在应用算法之前我生成弱学习器。拟合和预测是在 for 循环中完成的,可能需要一些时间,所以我想添加并行化以尝试加快进度。基本上, class A是一个分类器和 class B是我的算法,我想在其中拟合我生成的所有分类器。

最佳答案

从( source )可以读到:

The default backend of joblib will run each function call in isolatedPython processes, therefore they cannot mutate a common Python objectdefined in the main program.

However if the parallel function really needs to rely on the sharedmemory semantics of threads, it should be made explicit withrequire='sharedmem', for instance:


所以你有两个选择:1)你添加 require='sharedmem'给您的 Parallel为了:
Parallel(n_jobs=-1, require='sharedmem')(delayed(f)(x) for x in self.obj)
然而, source指出:

Keep in mind that relying a on the shared-memory semantics is probablysuboptimal from a performance point of view as concurrent access to ashared Python object will suffer from lock contention.


在 2) 选项中,您必须更改代码中的两件事。
先改 f功能来自:
 def f(x):
x.add5()
返回对象。
def f(x):
x.add5()
return x
并在 Parallel loop ,从:
  Parallel(n_jobs=-1)(delayed(f)(x) for x in self.obj)
进入:
 self.obj = Parallel(n_jobs=-1)(delayed(f)(x) for x in self.obj)
这样您就可以分配 self.obj到由并行循环返回的列表。
最终代码:
from joblib import Parallel, delayed


class A:
def __init__(self):
self.val = 0

def add5(self):
self.val += 5


class B:
def __init__(self):
self.obj = [A() for _ in range(10)]

def apply(self):
""" this is where I'm trying to use joblib:
for a in self.obj:
a.add5()"""

def f(x):
x.add5()
return x

self.obj = Parallel(n_jobs=-1)(delayed(f)(x) for x in self.obj)

def prnt(self):
print([a.val for a in self.obj])


b = B()
b.prnt() # returns [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
b.apply()
b.prnt() # returns [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] but
# I expect [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]

关于python - 如何将 joblib 并行化与不返回任何内容的类内方法一起使用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65206662/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com