gpt4 book ai didi

c++ - 如何使用 OpenCL C++ 绑定(bind)获得最大的全局工作量?

转载 作者:行者123 更新时间:2023-11-30 03:24:03 26 4
gpt4 key购买 nike

我想获得最大的全局工作量。我不想要一个内核 OpenCL 会尝试为您选择最好的一个,它可能是也可能不是最大大小。

为此,我想在调用 clEnqueueNDRangeKernel 时指定大小。例如:

clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, NULL, 0, NULL, NULL);

clGetKernelWorkGroupInfo documentation , 表示:

CL_KERNEL_GLOBAL_WORK_SIZE:这为应用程序提供了一种机制,可以查询可用于在由设备或内置内核提供的自定义设备上执行内核的最大全局大小(即 clEnqueueNDRangeKernel 的 global_work_size 参数)设备提供的 OpenCL 设备。

如何使用 OpenCL C++ 绑定(bind)获得 CL_KERNEL_GLOBAL_WORK_SIZE

我这样做

cl::array<size_t, 3> kernel_global_work_size = my_kernel.getWorkGroupInfo<CL_KERNEL_GLOBAL_WORK_SIZE>(my_device);

但是我得到了错误:

cl2.hpp:5771:12: note: candidate: template<class T> cl_int cl::Kernel::getWorkGroupInfo(const cl::Device&, cl_kernel_work_group_info, T*) const
cl_int getWorkGroupInfo(
^~~~~~~~~~~~~~~~
cl2.hpp:5771:12: note: template argument deduction/substitution failed:
cl2.hpp:5782:9: note: candidate: template<int name> typename cl::detail::param_traits<cl::detail::cl_kernel_work_group_info, name>::param_type cl::Kernel::getWorkGroupInfo(const cl::Device&, cl_int*) const
getWorkGroupInfo(const Device& device, cl_int* err = NULL) const

用这段代码

cl::array<size_t, 3> kernel_global_work_size;
my_kernel.getWorkGroupInfo<cl::array<size_t, 3>>(my_device, CL_KERNEL_GLOBAL_WORK_SIZE, &kernel_global_work_size);

我收到 OpenCL 错误 -30(无效值)

my_kernel 不是内置内核例如:cl::Kernel my_kernel = cl::Kernel(program, "my_kernel");my_device 不是自定义设备。例如:cl::Device device = myDevices[0];

最佳答案

是的,因为您的调用与签名匹配:

https://github.khronos.org/OpenCL-CLHPP/classcl_1_1_kernel.html

template <cl_int name> typename
detail::param_traits<detail::cl_kernel_work_group_info, name>::param_type getWorkGroupInfo(const Device& device, cl_int* err = NULL) const;

它看起来像 param_traits这是通过宏生成的,没有为 CL_KERNEL_GLOBAL_WORK_SIZE 声明.那将是标题中的错误。 ( GitHub issue created by OP )

对于一些条目 here缺少条目 here.

或者,您可以使用返回错误代码的版本,以及通过输出参数提供的信息,这应该可以解决该问题:

template<typename T>
cl_int getWorkGroupInfo(const Device &device, cl_kernel_work_group_info name, T *param) const;

调用可能如下所示:

cl::array<size_t, 3> result;
kernel.getWorkGroupInfo<decltype(result)>(device, CL_KERNEL_GLOBAL_WORK_SIZE, result);

我的问题是:您自己尝试过吗?结果不符合你的预期吗?


您收到 CL_INVALID_VALUE 了吗?

[...] on a custom device given by device or a built-in kernel on an OpenCL device given by device.

If device is not a custom device or kernel is not a built-in kernel, clGetKernelArgInfo returns the error CL_INVALID_VALUE.

参见 OpenCL 1.2 spec , 第 14 和 15 页:

Built-in Kernel: A built-in kernel is a kernel that is executed on an OpenCL device or custom device by fixed-function hardware or in firmware. Applications can query the built-in kernels supported by a device or custom device. A program object can only contain kernels written in OpenCL C or built-in kernels but not both. See also Kernel and Program.

Custom Device: An OpenCL device that fully implements the OpenCL Runtime but does not support programs written in OpenCL C. A custom device may be specialized non- programmable hardware that is very power efficient and performant for directed tasks or hardware with limited programmable capabilities such as specialized DSPs. Custom devices are not OpenCL conformant. Custom devices may support an online compiler. Programs for custom devices can be created using the OpenCL runtime APIs that allow OpenCL programs to be created from source (if an online compiler is supported) and/or binary, or from built-in kernels supported by the device. See also Device.

对于常规内核和设备,标准限制了工作组大小(设备属性),而全局大小仅受使用范围限制 size_t 。参见 clEnqueueNDRangeKernel .

关于c++ - 如何使用 OpenCL C++ 绑定(bind)获得最大的全局工作量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50044493/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com