gpt4 book ai didi

python - 我的 python 多处理代码比串行代码慢

转载 作者:太空宇宙 更新时间:2023-11-03 20:38:51 25 4
gpt4 key购买 nike

我正在尝试在 python 和 opencv 中实现图像处理技术“局部厚度”。它已在名为 ImageJ 的图像分析软件中实现。基本上对于二值图像,该算法将

  1. 骨架化任何白色物体(以创建骨架或山脊)
  2. 对于每个骨架/山脊点,找到到最近边缘的距离
  3. 对于此距离内的任何点,将厚度值指定为距离,或者如果距离大于现有厚度值则更新厚度

我想使用多处理实现的部分是3。原始代码是here 。在Python中,我将所有骨架/山脊点分成 block ,并将每个卡盘传递给一个进程。所有进程都通过一个存储厚度值的共享数组进行通信。但是,我的多处理代码比串行代码慢,即使对于仅处理部分数据的任何一个进程也是如此。

import numpy as np
import cv2 as cv
import matplotlib.pylab as plt
from skimage.morphology import medial_axis
from scipy.sparse import coo_matrix
import multiprocessing as mp
import time

def worker(sRidge_shared,iRidge,jRidge,rRidge,w,h,iR_worker,worker):
print('Job starting for worker',worker)
start=time.time()
for iR in iR_worker:
i = iRidge[iR];
j = jRidge[iR];
r = rRidge[iR];
rSquared = int(r * r + 0.5)
rInt = int(r)
if (rInt < r): rInt+=1
iStart = i - rInt
if (iStart < 0): iStart = 0
iStop = i + rInt
if (iStop >= w): iStop = w - 1
jStart = j - rInt
if (jStart < 0): jStart = 0
jStop = j + rInt
if (jStop >= h): jStop = h - 1
for j1 in range(jStart,jStop):
r1SquaredJ = (j1 - j) * (j1 - j)
if (r1SquaredJ <= rSquared):
for i1 in range(iStart,iStop):
r1Squared = r1SquaredJ + (i1 - i) * (i1 - i)
if (r1Squared <= rSquared):
if (rSquared > sRidge_shared[i1+j1*w]):
sRidge_shared[i1+j1*w] = rSquared
print('Worker',worker,' finished job in ',time.time()-start, 's')



def Ridge_to_localthickness_parallel(ridgeimg):
w, h = ridgeimg.shape
M = coo_matrix(ridgeimg)
nR = M.count_nonzero()
iRidge = M.row
jRidge = M.col
rRidge = M.data
sRidge = np.zeros((w*h,))
sRidge_shared = mp.Array('d', sRidge)

nproc = 10

p = [mp.Process(target=worker,
args=(sRidge_shared,iRidge,jRidge,rRidge,w,h,range(i*nR//nproc,min((i+1)*nR//nproc,nR)),i))
for i in range(nproc)]
for pc in p:
pc.start()
for pc in p:
pc.join()

a = np.frombuffer(sRidge_shared.get_obj())
b = a.reshape((h,w))

return 2*np.sqrt(b)

if __name__ == '__main__':
mp.freeze_support()
size = 1024

img = np.zeros((size,size), np.uint8)
cv.ellipse(img,(size//2,size//2),(size//3,size//5),0,0,360,255,-1)

skel, distance = medial_axis(img, return_distance=True)
dist_on_skel = distance * skel

start = time.time()
LT1 = Ridge_to_localthickness_parallel(dist_on_skel)
print('Multiprocessing elapsed time: ', time.time() - start, 's')

结果如下:

Serial elapsed time:  71.07010626792908 s
Job starting for worker 0
Job starting for worker 1
Job starting for worker 2
Job starting for worker 3
Job starting for worker 4
Job starting for worker 5
Job starting for worker 7
Job starting for worker 6
Job starting for worker 8
Job starting for worker 9
Worker 0 finished job in 167.6777663230896 s
Worker 9 finished job in 181.82518076896667 s
Worker 1 finished job in 211.21311926841736 s
Worker 8 finished job in 211.43014097213745 s
Worker 7 finished job in 235.29852747917175 s
Worker 2 finished job in 241.1481122970581 s
Worker 6 finished job in 242.3452320098877 s
Worker 3 finished job in 247.0727047920227 s
Worker 5 finished job in 245.52154970169067 s
Worker 4 finished job in 246.9776954650879 s
Multiprocessing elapsed time: 256.9716944694519 s
>>>

我在 Windows 机器上运行它。我没有尝试过多线程,因为我不知道如何访问多线程的共享数组。

编辑:

我使用了sharedmem和Thread/ThreadPoolExecutor。结果比多处理好,但比串行处理好。

Serial elapsed time:  67.51724791526794 s
Job starting for worker 0
Job starting for worker 1
Job starting for worker 2
Job starting for worker 3
Job starting for worker 4
Job starting for worker 6
Job starting for worker 5
Job starting for worker 7
Job starting for worker 8
Job starting for worker 9
Job starting for worker 10
Job starting for worker 11
Job starting for worker 12
Job starting for worker 13
Job starting for worker 14
Job starting for worker 15
Job starting for worker 16
Job starting for worker 17
Job starting for worker 18
Job starting for worker 19
Worker 2 finished job in 60.84959959983826 s
Worker 3 finished job in 63.856611013412476 s
Worker 4 finished job in 67.02961277961731 s
Worker 16 finished job in 68.00975942611694 s
Worker 15 finished job in 70.39874267578125 s
Worker 1 finished job in 75.65659618377686 s
Worker 14 finished job in 76.97173047065735 s
Worker 9 finished job in 78.4876492023468 s
Worker 0 finished job in 87.56459546089172 s
Worker 7 finished job in 89.86062669754028 s
Worker 17 finished job in 91.72178316116333 s
Worker 8 finished job in 94.22166323661804 s
Worker 19 finished job in 93.27084946632385 s
Worker 13 finished job in 95.02370047569275 s
Worker 5 finished job in 98.98063397407532 s
Worker 18 finished job in 97.57283663749695 s
Worker 10 finished job in 103.78466653823853 s
Worker 11 finished job in 105.19767212867737 s
Worker 6 finished job in 105.96561932563782 s
Worker 12 finished job in 105.5306978225708 s
Threading elapsed time: 106.97455644607544 s
>>>

最佳答案

在多个进程之间共享数组会产生巨大的成本。

基本上,这是“估计”多重处理时间的方法:

  • 是时候分享所有数据了
  • 计算时间(应该比串行计算慢,因为它应该计算更少)
  • 汇总结果。

在这里,我高度怀疑第一步会带来巨大的成本(大数组)

通常,您可以轻松地对可以轻松分离的代码进行多进程/多线程处理(不需要完整的数组)

关于python - 我的 python 多处理代码比串行代码慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56982499/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com