gpt4 book ai didi

python - PyOpenCL 二维数组内核 get_global_id(1) 错误

转载 作者:太空宇宙 更新时间:2023-11-04 02:39:43 25 4
gpt4 key购买 nike

我真的是 OpenCL 的新手。我从这个网站上获取了示例代码:http://www.drdobbs.com/open-source/easy-opencl-with-python/240162614?pgno=2我对它进行了一些定制。我的目标是向内核发送一个包含 1 个数字的 4x4 矩阵,然后从内核中恢复它。我知道这是一个微不足道的代码,但我需要这样做才能了解 OpenCL 的工作原理。输入矩阵是这个:

 [[ 1.  1.  1.  1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]

但是,我从内核得到的输出是这个,应该和输入一样:

[[ 1.  1.  1.  1.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]

这是我的完整代码:

import pyopencl as cl
from pyopencl import array
import numpy as np

## Step #1. Obtain an OpenCL platform.
platform = cl.get_platforms()[0]

## It would be necessary to add some code to check the check the support for
## the necessary platform extensions with platform.extensions

## Step #2. Obtain a device id for at least one device (accelerator).
device = platform.get_devices()[1]

## It would be necessary to add some code to check the check the support for
## the necessary device extensions with device.extensions

## Step #3. Create a context for the selected device.
context = cl.Context([device])

## Step #4. Create the accelerator program from source code.
## Step #5. Build the program.
## Step #6. Create one or more kernels from the program functions.
program = cl.Program(context, """
__kernel void matrix_dot_vector(const unsigned int size, __global const float *matrix, __global float *result)
{
int x = get_global_id(0);
int y = get_global_id(1);
result[x + size * y] = matrix[x + size * y];
}
""").build()

matrix = np.ones((4,4), np.float32)

## Step #7. Create a command queue for the target device.
queue = cl.CommandQueue(context)

## Step #8. Allocate device memory and move input data from the host to the device memory.
mem_flags = cl.mem_flags
#matrix_buf = cl.Buffer(context, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=matrix)
matrix_buf = cl.Buffer(context, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=matrix)
destination_buf = cl.Buffer(context, mem_flags.WRITE_ONLY, matrix.nbytes)

## Step #9. Associate the arguments to the kernel with kernel object.
## Step #10. Deploy the kernel for device execution.
program.matrix_dot_vector(queue, matrix.shape, None, np.int32(matrix.size), matrix_buf, destination_buf)

## Step #11. Move the kernels output data to host memory.
matrix_dot_vector = np.ones((4,4), np.float32)
cl.enqueue_copy(queue, matrix_dot_vector, destination_buf)

## Step #12. Release context, program, kernels and memory.
## PyOpenCL performs this step for you, and therefore,
## you don't need to worry about cleanup code

print(matrix_dot_vector)

据我所知,int y = get_global_id(1); 的值始终为 0。这就是导致错误的原因,我不明白为什么它始终为 0,因为我将正确的形状传递给内核 program.matrix_dot_vector(queue, matrix.shape, None, np.int32(matrix.size), matrix_buf, destination_buf) 这是第二个参数 matrix.shape 和等于 (4,4)。

有没有人猜到哪里出了问题?

谢谢!

最佳答案

第一个内核参数传递了错误的值 - 大小不应该是总矩阵大小。将 np.int32(matrix.size) 更改为 np.int32(matrix.shape[0])

关于python - PyOpenCL 二维数组内核 get_global_id(1) 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46858668/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com