gpt4 book ai didi

python - 将函数转换为 NumbaPro CUDA

转载 作者:太空宇宙 更新时间:2023-11-04 10:43:00 26 4
gpt4 key购买 nike

我正在比较几个 Python 模块/扩展或方法来实现以下目标:

import numpy as np

def fdtd(input_grid, steps):
grid = input_grid.copy()
old_grid = np.zeros_like(input_grid)
previous_grid = np.zeros_like(input_grid)

l_x = grid.shape[0]
l_y = grid.shape[1]

for i in range(steps):
np.copyto(previous_grid, old_grid)
np.copyto(old_grid, grid)

for x in range(l_x):
for y in range(l_y):
grid[x,y] = 0.0
if 0 < x+1 < l_x:
grid[x,y] += old_grid[x+1,y]
if 0 < x-1 < l_x:
grid[x,y] += old_grid[x-1,y]
if 0 < y+1 < l_y:
grid[x,y] += old_grid[x,y+1]
if 0 < y-1 < l_y:
grid[x,y] += old_grid[x,y-1]

grid[x,y] /= 2.0
grid[x,y] -= previous_grid[x,y]

return grid

此函数是时域有限差分 (FDTD) 方法的一个非常基本的实现。我已经通过多种方式实现了这个功能:

  • 使用更多 NumPy 例程
  • 在 Cython 中
  • 使用 Numba(自动)jit。

现在我想比较 NumbaPro CUDA 的性能。

这是我第一次为 CUDA 编写代码,我想出了下面的代码。

from numbapro import cuda, float32, int16
import numpy as np

@cuda.jit(argtypes=(float32[:,:], float32[:,:], float32[:,:], int16, int16, int16))
def kernel(grid, old_grid, previous_grid, steps, l_x, l_y):

x,y = cuda.grid(2)

for i in range(steps):
previous_grid[x,y] = old_grid[x,y]
old_grid[x,y] = grid[x,y]

for i in range(steps):

grid[x,y] = 0.0

if 0 < x+1 and x+1 < l_x:
grid[x,y] += old_grid[x+1,y]
if 0 < x-1 and x-1 < l_x:
grid[x,y] += old_grid[x-1,y]
if 0 < y+1 and y+1 < l_x:
grid[x,y] += old_grid[x,y+1]
if 0 < y-1 and y-1 < l_x:
grid[x,y] += old_grid[x,y-1]

grid[x,y] /= 2.0
grid[x,y] -= previous_grid[x,y]


def fdtd(input_grid, steps):

grid = cuda.to_device(input_grid)
old_grid = cuda.to_device(np.zeros_like(input_grid))
previous_grid = cuda.to_device(np.zeros_like(input_grid))

l_x = input_grid.shape[0]
l_y = input_grid.shape[1]

kernel[(16,16),(32,8)](grid, old_grid, previous_grid, steps, l_x, l_y)

return grid.copy_to_host()

不幸的是,我收到以下错误:

  File ".../fdtd_numbapro.py", line 98, in fdtd
return grid.copy_to_host()
File "/opt/anaconda1anaconda2anaconda3/lib/python2.7/site-packages/numbapro/cudadrv/devicearray.py", line 142, in copy_to_host
File "/opt/anaconda1anaconda2anaconda3/lib/python2.7/site-packages/numbapro/cudadrv/driver.py", line 1702, in device_to_host
File "/opt/anaconda1anaconda2anaconda3/lib/python2.7/site-packages/numbapro/cudadrv/driver.py", line 772, in check_error
numbapro.cudadrv.error.CudaDriverError: CUDA_ERROR_LAUNCH_FAILED
Failed to copy memory D->H

我也使用过 grid.to_host() ,但这两者都不起作用。CUDA 肯定可以在此系统上使用 NumbaPro。

最佳答案

问题由用户解决。对于这个问题,我正在交叉引用 Anaconda 邮件列表上的讨论:https://groups.google.com/a/continuum.io/forum/#!searchin/anaconda/fdtd/anaconda/VgiN4h37UrA/18tAc60EIkcJ

关于python - 将函数转换为 NumbaPro CUDA,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19367488/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com