gpt4 book ai didi

python - CUDA GPU处理: TypeError: compile_kernel() got an unexpected keyword argument 'boundscheck'

转载 作者:行者123 更新时间:2023-12-03 16:45:06 26 4
gpt4 key购买 nike

今天,我开始使用CUDA和GPU处理。我找到了本教程:
https://www.geeksforgeeks.org/running-python-script-on-gpu/

不幸的是,我第一次运行gpu代码的尝试失败了:

from numba import jit, cuda 
import numpy as np
# to measure exec time
from timeit import default_timer as timer

# normal function to run on cpu
def func(a):
for i in range(10000000):
a[i]+= 1

# function optimized to run on gpu
@jit(target ="cuda")
def func2(a):
for i in range(10000000):
a[i]+= 1
if __name__=="__main__":
n = 10000000
a = np.ones(n, dtype = np.float64)
b = np.ones(n, dtype = np.float32)

start = timer()
func(a)
print("without GPU:", timer()-start)

start = timer()
func2(a)
print("with GPU:", timer()-start)

输出:
/home/amu/anaconda3/bin/python /home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py
without GPU: 4.89985659904778
Traceback (most recent call last):
File "/home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py", line 30, in <module>
func2(a)
File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/dispatcher.py", line 40, in __call__
return self.compiled(*args, **kws)
File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 758, in __call__
kernel = self.specialize(*args)
File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 769, in specialize
kernel = self.compile(argtypes)
File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 785, in compile
**self.targetoptions)
File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/core/compiler_lock.py", line 32, in _acquire_compile_lock
return func(*args, **kwargs)
TypeError: compile_kernel() got an unexpected keyword argument 'boundscheck'

Process finished with exit code 1

我已经在pycharm的anaconda环境中安装了教程中提到的 numbacudatoolkit

最佳答案

添加答案以使此答案脱离未答复的队列。

该示例中的代码已损坏。您的numba或CUDA安装没有任何问题。问题中的代码(或从其复制博客的博客)无法发出博客帖子声明的结果。

有很多方法可以将其修改为起作用。一个会是这样的:

from numba import vectorize, jit, cuda 
import numpy as np
# to measure exec time
from timeit import default_timer as timer

# normal function to run on cpu
def func(a):
for i in range(10000000):
a[i]+= 1

# function optimized to run on gpu
@vectorize(['float64(float64)'], target ="cuda")
def func2(x):
return x+1

if __name__=="__main__":
n = 10000000
a = np.ones(n, dtype = np.float64)

start = timer()
func(a)
print("without GPU:", timer()-start)

start = timer()
func2(a)
print("with GPU:", timer()-start)

在这里, func2变成为设备编译的ufunc。然后,它将在GPU的整个输入阵列上运行。这样做是这样的:
$ python bogoexample.py 
without GPU: 4.314514834433794
with GPU: 0.21419800259172916

因此速度更快,但请记住,GPU时间包括编译GPU ufunc所需的时间

另一种选择是实际编写GPU内核。像这样:
from numba import vectorize, jit, cuda 
import numpy as np
# to measure exec time
from timeit import default_timer as timer

# normal function to run on cpu
def func(a):
for i in range(10000000):
a[i]+= 1

# function optimized to run on gpu
@vectorize(['float64(float64)'], target ="cuda")
def func2(x):
return x+1

# kernel to run on gpu
@cuda.jit
def func3(a, N):
tid = cuda.grid(1)
if tid < N:
a[tid] += 1


if __name__=="__main__":
n = 10000000
a = np.ones(n, dtype = np.float64)

for i in range(0,5):
start = timer()
func(a)
print(i, " without GPU:", timer()-start)

for i in range(0,5):
start = timer()
func2(a)
print(i, " with GPU ufunc:", timer()-start)

threadsperblock = 1024
blockspergrid = (a.size + (threadsperblock - 1)) // threadsperblock
for i in range(0,5):
start = timer()
func3[blockspergrid, threadsperblock](a, n)
print(i, " with GPU kernel:", timer()-start)

像这样运行:
$ python bogoexample.py 
0 without GPU: 4.885275377891958
1 without GPU: 4.748716968111694
2 without GPU: 4.902181145735085
3 without GPU: 4.889955999329686
4 without GPU: 4.881594380363822
0 with GPU ufunc: 0.16726416163146496
1 with GPU ufunc: 0.03758022002875805
2 with GPU ufunc: 0.03580896370112896
3 with GPU ufunc: 0.03530424740165472
4 with GPU ufunc: 0.03579768259078264
0 with GPU kernel: 0.1421878095716238
1 with GPU kernel: 0.04386183246970177
2 with GPU kernel: 0.029975440353155136
3 with GPU kernel: 0.029602501541376114
4 with GPU kernel: 0.029780613258481026

在这里,您可以看到内核的运行速度比ufunc快,并且缓存(这是JIT编译函数的缓存,而不是调用的内存)大大提高了GPU上的调用速度。

关于python - CUDA GPU处理: TypeError: compile_kernel() got an unexpected keyword argument 'boundscheck' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61982672/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com