opencv - OpenCL无法使用OpenCV检测我的AMD GPU-6ren

opencv - OpenCL无法使用OpenCV检测我的AMD GPU

转载作者：行者123 更新时间：2023-12-02 17:37:27

我正在使用AMD Radeon R9 M375。我尝试按照此答案https://stackoverflow.com/a/34250412/8731839进行操作，但对我不起作用。

我遵循此:http://answers.opencv.org/question/108646/opencl-can-not-detect-my-nvidia-gpu-via-opencv/?answer=108784#post-id-108784

这是我的clinfo.exe输出

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    AMD Radeon (TM) R9 M375
  Device Topology:               PCI[ B#4, D#0, F#0 ]
  Max compute units:                 10
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1015Mhz
  Address bits:                  32
  Max memory allocation:             3019898880
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Max pipe arguments:                0
  Max pipe active reservations:          0
  Max pipe packet size:              0
  Max global variable size:          0
  Max global variable preferred total size:  0
  Max read/write image args:             0
  Max on device events:              0
  Queue on device max size:          0
  Max on device queues:              0
  Queue on device preferred size:        0
  SVM capabilities:              
    Coarse grain buffer:             No
    Fine grain buffer:               No
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                No
    Profiling :                  No
  Platform ID:                   00007FFF209D0188
  Name:                      Capeverde
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                2348.3
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (2348.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics 

cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing 

cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing 

cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash 


      Device Type:                   CL_DEVICE_TYPE_CPU
      Vendor ID:                     1002h
      Board name:                    
      Max compute units:                 4
      Max work items dimensions:             3
        Max work items[0]:               1024
        Max work items[1]:               1024
        Max work items[2]:               1024
      Max work group size:               1024
      Preferred vector width char:           16
      Preferred vector width short:          8
      Preferred vector width int:            4
      Preferred vector width long:           2
      Preferred vector width float:          8
      Preferred vector width double:         4
      Native vector width char:          16
      Native vector width short:             8
      Native vector width int:           4
      Native vector width long:          2
      Native vector width float:             8
      Native vector width double:            4
      Max clock frequency:               2200Mhz
      Address bits:                  64
      Max memory allocation:             2147483648
      Image support:                 Yes
      Max number of images read arguments:       128
      Max number of images write arguments:      64
      Max image 2D width:                8192
      Max image 2D height:               8192
      Max image 3D width:                2048
      Max image 3D height:               2048
      Max image 3D depth:                2048
      Max samplers within kernel:            16
      Max size of kernel argument:           4096
      Alignment (bits) of base address:      1024
      Minimum alignment (bytes) for any datatype:    128
      Single precision floating point capability
        Denorms:                     Yes
        Quiet NaNs:                  Yes
        Round to nearest even:           Yes
        Round to zero:               Yes
        Round to +ve and infinity:           Yes
        IEEE754-2008 fused multiply-add:         Yes
      Cache type:                    Read/Write
      Cache line size:               64
      Cache size:                    32768
      Global memory size:                8499593216
      Constant buffer size:              65536
      Max number of constant args:           8
      Local memory type:                 Global
      Local memory size:                 32768
      Max pipe arguments:                16
      Max pipe active reservations:          16
      Max pipe packet size:              2147483648
      Max global variable size:          1879048192
      Max global variable preferred total size:  1879048192
      Max read/write image args:             64
      Max on device events:              0
      Queue on device max size:          0
      Max on device queues:              0
      Queue on device preferred size:        0
      SVM capabilities:              
        Coarse grain buffer:             No
        Fine grain buffer:               No
        Fine grain system:               No
        Atomics:                     No
      Preferred platform atomic alignment:       0
      Preferred global atomic alignment:         0
      Preferred local atomic alignment:      0
      Kernel Preferred work group size multiple:     1
      Error correction support:          0
      Unified memory for Host and Device:        1
      Profiling timer resolution:            465
      Device endianess:              Little
      Available:                     Yes
      Compiler available:                Yes
      Execution capabilities:                
        Execute OpenCL kernels:          Yes
        Execute native function:             Yes
      Queue on Host properties:              
        Out-of-Order:                No
        Profiling :                  Yes
      Queue on Device properties:                
        Out-of-Order:                No
        Profiling :                  No
      Platform ID:                   00007FFF209D0188
      Name:                      Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
      Vendor:                    GenuineIntel
      Device OpenCL C version:           OpenCL C 1.2 
      Driver version:                2348.3 (sse2,avx)
      Profile:                   FULL_PROFILE
      Version:                   OpenCL 1.2 AMD-APP (2348.3)

什么有效:

std::vector<cv::ocl::PlatformInfo> platforms;
cv::ocl::getPlatfomsInfo(platforms);

//OpenCL Platforms
for (size_t i = 0; i < platforms.size(); i++)
{

    //Access to Platform
    const cv::ocl::PlatformInfo* platform = &platforms[i];

    //Platform Name
    std::cout << "Platform Name: " << platform->name().c_str() << "\n";
    //Access Device within Platform
    cv::ocl::Device current_device;
    for (int j = 0; j < platform->deviceNumber(); j++)
    {
        //Access Device
        platform->getDevice(current_device, j);
        //Device Type
        int deviceType = current_device.type();
        cout << "Device Number: " << platform->deviceNumber() << endl;
        cout << "Device Type: " << deviceType << endl;
    }
}

上面的代码显示

 Platform Name: Intel(R) OpenCL
 Device Number: 2
 Device Type: 2
 Device Number: 2
 Device Type: 4 
 Platform Name: AMD Accelerated Parallel Processing
 Device Number: 2
 Device Type: 4 
 Device Number: 2
 Device Type: 2

如何使用AMD作为我的GPU从此处创建上下文？链接的文章说使用 initializeContextFromHandler方法，但是有关OpenCV的文档还不够。 Documentation Link

最佳答案

问题已解决。我不知道自己做了什么，但是AMD现在正在工作。

当前设置(在Windows上):

环境变量:

Name: OPENCV_OPENCL_DEVICE

Value: AMD:GPU:Capeverde

使用setUseOpenCL(bool foo)中存在的ocl.hpp选择使用GPU还是CPU。

最可能出现的问题:在我的实际代码中，我没有进行任何计算，但是当我编写了一个用于将两个矩阵相减的简单代码时，AMD开始工作。

码:

#include <opencv2/core/ocl.hpp>
#include <opencv2/opencv.hpp>

int main() {
    cv::UMat mat1 = cv::UMat::ones(10, 10, CV_32F);
    cv::UMat mat2 = cv::UMat::zeros(10, 10, CV_32F);
    cv::UMat output = cv::UMat(10, 10, CV_32F);
    cv::subtract(mat1, mat2, output);
    std::cout << output << "\n";
    std::getchar();
}

关于opencv - OpenCL无法使用OpenCV检测我的AMD GPU，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48998571/

文章推荐： opencv - OpenCV简短描述符少于关键点

文章推荐： opencv - 从CV_8UC1到OpenCV ConvertTo CV_32SC1

文章推荐： python - 计算图像对象的中心点以确定方向

opencl - 英特尔 OpenCL 与。 Khronos OpenCL
Intel、AMD 和 Khronos OpenCL 之间有什么区别。我对 OpenCL 完全陌生，想从它开始。我不知道在我的操作系统上安装哪个更好。最佳答案 OpenCL 是 C 和 C++ 语言
opencl - 从另一个 OpenCL 内核调用 OpenCL 内核
我在这里的一篇文章中看到，我们可以从 OpenCL 内核调用函数。但是在我的情况下，我还需要并行化该复杂函数(由所有可用线程运行)，所以我是否必须将该函数也设为内核并像从主内核中调用函数一样直接调
opencl - OpenCL 和 OpenCL Embedded 配置文件之间的主要区别
最近我看到一些开发板支持 OpenCL EP，例如 odroid XU。我知道的一件事是 OpenCL EP 适用于 ARM 处理器，但它与基于主要桌面的 OpenCL 在哪些特性上有所不同。最佳答
opencl - OpenCL 中内核参数数量的限制
我想知道在 OpenCL 中设置为内核函数的参数数量是否有任何限制。设置参数时出现 INVALID_ARG_INDEX 错误。我在内核函数中设置了 9 个参数。请在这方面帮助我。最佳答案您可以尝试
opencl - OpenCL 中零拷贝的访问路径
我对零拷贝的工作原理有点困惑。 1-要确认以下内容对应于opencl中的零拷贝。 ....................... . . . .
opencl - OpenCL 中的重叠传输和设备计算
我是 OpenCL 的初学者，我很难理解某些东西。我想改进主机和设备之间的图像传输。我制定了一个计划以更好地了解我。顶部:我现在拥有的 |底部:我想要的 HtD(主机到设备)和 DtH(设备到主
opencl - OpenCL 本地内存有限制吗？
今天我又加了四个 __local变量到我的内核以转储中间结果。但是只需将另外四个变量添加到内核的签名并添加相应的内核参数就会将内核的所有输出呈现为“0”。没有一个 cl 函数返回错误代码。我进一步尝
opencl - OpenCL 工作项是否并行执行？
我知道工作项被分组到工作组中，并且您不能在工作组之外进行同步。这是否意味着工作项是并行执行的？如果是这样，使用 128 个工作项创建 1 个工作组是否可能/有效？最佳答案组内的工作项将一起安排
opencl - OpenCL 上下文中的扭曲是什么？
我相当确定经纱仅在 CUDA 中定义。但也许我错了。就 OpenCL 而言，什么是扭曲？它与工作组不一样，是吗？任何相关的反馈都受到高度赞赏。谢谢! 最佳答案它没有在 OpenCL 标准中定义。
opencl - OpenCL 调试器
已结束。此问题正在寻求书籍、工具、软件库等的推荐。它不满足Stack Overflow guidelines 。目前不接受答案。我们不允许提出寻求书籍、工具、软件库等推荐的问题。您可以编辑问题，以便
opencl - OpenCL 中的障碍
在OpenCL中，我的理解是可以使用barrier()函数来同步工作组中的线程。我(通常)确实了解它们的用途以及何时使用它们。我还知道工作组中的所有线程都必须遇到障碍，否则会出现问题。然而，到目前为止
opencl - OpenCL 中的平台
我的主板上有 Nvidia 显卡 (GeForce GT 640)。我已经在我的盒子上安装了 OpenCL。当我使用“clGetPlatformInfo(参数)”查询平台时，我看到以下输出:-#可用平
opencl - OpenCL 内核执行时间过长导致崩溃
我目前正在构建一个 ray marcher 来查看像 mandelbox 等东西。它工作得很好。但是，在我当前的程序中，它使用每个 worker 作为从眼睛转换的光线。这意味着每个 worker 有大
opencl - OpenCl 寄存器的神奇数字
我编写了两个不同的 openCl 内核，使用 nvidia profiler 获取了有关它们的一些信息，发现两者每个工作项都使用 63 个寄存器。我尝试了一切我能想到的方法来降低这个数字(用 ush
opencl - OpenCL 中的平台
我的主板上有 Nvidia 显卡 (GeForce GT 640)。我已经在我的盒子上安装了 OpenCL。当我使用“clGetPlatformInfo(参数)”查询平台时，我看到以下输出:-#可用平
opencl - OpenCL 内核执行时间过长导致崩溃
我目前正在构建一个 ray marcher 来查看像 mandelbox 等东西。它工作得很好。但是，在我当前的程序中，它使用每个 worker 作为从眼睛转换的光线。这意味着每个 worker 有大
opencl - OpenCL 中的矩阵求逆
我正在尝试使用 OpenCL 加速一些计算，算法的一部分包括矩阵求逆。是否有任何开源库或免费可用的代码来计算用 OpenCL 或 CUDA 编写的矩阵的 lu 分解(lapack dgetrf 和 d
opencl - OpenCL 支持动态并行性...？
我正在尝试在 OpenCL 内核中使用递归。编译成功，但运行时出现编译错误，所以我想知道，由于 CUDA 现在支持动态并行，OpenCL 是否支持动态并行？最佳答案 OpenCL 不支持递归。请参阅
opencl - OpenCL 中主机和设备之间的内存传输？
考虑以下代码，它从大小为 size 的 double 组创建缓冲区内存对象: coef_mem = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM
opencl - OpenCL 中目标平台的示例是什么？
OpenCL 中目标平台的示例是什么？例如，它是 Windows、Android、Mac 等操作系统，还是设备中的实际芯片？最佳答案 OpenCL 平台本质上是一个 OpenCL 实现。它与操作系统

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

opencv - OpenCL无法使用OpenCV检测我的AMD GPU