gpt4 book ai didi

python - python 中的 numba CUDA 非常慢

转载 作者:行者123 更新时间:2023-11-30 23:01:00 32 4
gpt4 key购买 nike

我在 numba cuda 中运行这个简单的代码,发现速度非常慢。知道瓶颈吗?

   @cuda.jit('int32(float64,  float64, int32)', device=True)
def mandelbrot_numbagpu(creal,cimag,maxiter):
real = creal
imag = cimag
for n in range(maxiter):
real2 = real*real
imag2 = imag*imag
if real2 + imag2 > 4.0:
return n
imag = 2* real*imag + cimag
real = real2 - imag2 + creal
return 0



@cuda.jit
def mandelbrot_set_numbagpu(xmin,xmax,ymin,ymax,width,height,maxiter,n3,r1,r2):
for i in range(width):
for j in range(height):
n3[i,j] = mandelbrot_numbagpu(r1[i],r2[j],maxiter)


r1 = np.linspace(-2.0,0.5,1000, dtype=np.float )
r2 = np.linspace(-1.25,1.25,1000, dtype=np.float)
n3 = np.zeros((1000,1000), dtype=np.uint8)

%timeit mandelbrot_set_numbagpu(-2.0,0.5,-1.25,1.25,1000,1000,80,n3,r1,r2)
#1 loops, best of 3: 4.84 s per loop

如果我在 JIT 上运行,速度会快 10 倍!......

最佳答案

一般而言,对于 Numba/CUDA(我认为对于 CUDA)来说,您的函数不应该迭代数组。相反,它们应该处理单个数组元素,并且 Numbda/CUDA 处理程序将一大堆数组元素分配给一大堆 GPU 核心,因此一切都会快速并行地发生。这是all documented .

不幸的是,这意味着您不能只是将 @jit 更改为 @cuda.jit,而是必须对其进行调整。

以下作品:

# mandelbrot_numbagpu as before...

# I've removed some of the useless arguments for simplicity
@cuda.jit
def mandelbrot_set_numbagpu(n3,r1,r2,maxiter):
# numba provides this function for working out which element you're
# supposed to be accessing
i,j = cuda.grid(2)
if i<n3.shape[0] and j<n3.shape[1]: # check we're in range
# do work on a single element
n3[i,j] = mandelbrot_numbagpu(r1[i],r2[j],maxiter)

然后您将其称为

# you assign a number of threads, and split it into blocks
# this is all in the documentation!
import math
threadsperblock = (16,16)
blockspergrid_x = math.ceil(n3.shape[0] / threadsperblock[0])
blockspergrid_y = math.ceil(n3.shape[1] / threadsperblock[1])
blockspergrid = (blockspergrid_x, blockspergrid_y)

mandelbrot_set_numbagpu2[blockspergrid,threadsperblock](n3,r1,r2,80)
# n3, r1 and r2 are defined as before

在我的 PC 上,速度提高了 3800 倍。我不知道这与同等的 CPU 程序相比如何。

关于python - python 中的 numba CUDA 非常慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35051998/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com