gpt4 book ai didi

multithreading - 官方 OpenCL 2.2 标准是否支持 WaveFront?

转载 作者:行者123 更新时间:2023-12-03 13:03:44 29 4
gpt4 key购买 nike

众所周知,AMD-OpenCL 支持 WaveFront(2015 年 8 月):http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf

The AMD Radeon HD 7770 GPU, for example, supports more than 25,000 in-flight work-items and can switch to a new wavefront (containing up to 64 work-items) in a single cycle.



但是为什么在 OpenCL 标准 1.0/2.0/2.2 中没有提到 WaveFront?

没有一个PDF没有一个字 波前 : https://www.khronos.org/registry/OpenCL/specs/

我还发现:
  • 2013年:https://community.amd.com/thread/160658

  • OpenCL is a open standard. It still does not support this swizzling concept. It does not even support wavefront/warp yet.


  • 2013年:https://stackoverflow.com/a/19874984/1558037

  • That's why the concept is not on the OpenCL specification itself.


  • 2011年:https://forums.khronos.org/showthread.php/7211-How-can-i-split-my-work-load-in-a-GPU-with-OpenCL

  • Standard OpenCL doesn't have the notion of a "wavefront"


  • 2011年:https://www.cvg.ethz.ch/teaching/2011spring/gpgpu/GPU-Optimization.pdf

  • enter image description here

    确实官方OpenCL 2.2标准还不支持WaveFront?

    结语 :

    OpenCL 标准中没有 WaveFront,但 在 OpenCL-2.0 中,有类似于 WaveFronts 的 SIMD 执行模型的子组 .
  • 第 100 页: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_User_Guide2.pdf

  • 6.4.2 Workgroup/subgroup-level functions

    OpenCL 2.0 introduces a Khronos sub-group extension. Sub-groups are a logical abstraction of the hardware SIMD execution model akin to wavefronts, warps, or vectors and permit programming closer to the hardware in a vendor-independent manner. This extension includes a set of cross-sub-group built-in functions that match the set of the cross-work-group built-in functions specified above.

    最佳答案

    他们一定采用了一种更动态的方法,称为 sub-group :https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf

    Sub-group: Sub-groups are an implementation-dependent grouping of work-items within a
    work-group. The size and number of sub-groups is implementation-defined.


    Work-groups are further divided into sub-groups,
    which provide an additional level of control over execution.


    The mapping of work-items to
    sub-groups is implementation-defined and may be queried at runtime.

    所以即使它没有被称为波前,它现在可以在运行时查询并且

    In the absence of synchronization functions (e.g. a barrier), work-items within a sub-group may be serialized. In the presence of sub -group functions, work-items within a sub -group may be serialized before any given sub -group function, between dynamically encountered pairs of sub - group functions and between a work-group function and the end of the kernel.



    甚至有时可能会丢失锁步方式。

    在这些之上,
     sub_group_all() and
    sub_group_broadcast() and are described in OpenCL C++ kernel language and IL specifications.
    The use of these sub-group functions implies sequenced-before relationships between statements
    within the execution of a single work-item in order to satisfy data dependencies.

    说存在某种子组内通信。因为现在 opencl 有子内核定义:
    Device-side enqueue: A mechanism whereby a kernel-instance is enqueued by a kernel-instance
    running on a device without direct involvement by the host program. This produces nested
    parallelism; i.e. additional levels of concurrency are nested inside a running kernel-instance.
    The kernel-instance executing on a device (the parent kernel) enqueues a kernel-instance (the
    child kernel) to a device-side command queue. Child and parent kernels execute asynchronously
    though a parent kernel does not complete until all of its child-kernels have completed.

    最终,像
    kernel void launcher()
    {
    ndrange_t ndrange = ndrange_1D(1);
    enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange,
    ^{
    size_t id = get_global_id(0);
    }
    );
    }

    您应该能够以您需要的任何大小生成您自己的(升级的?)波前,并且它们与父内核同时工作(并且可以与子组内线程通信)但它们不被称为波前,因为它们不是由硬件硬编码 imho .

    2.0 api 规范说:
    Extreme care should be exercised when writing code that uses
    subgroups if the goal is to write portable OpenCL applications.

    这提醒了 amd 的 16 宽 simds 和 nvidia 的 32 宽 simds 与一些虚构的 fpga 的 95 宽计算核心。也许是伪波前?

    关于multithreading - 官方 OpenCL 2.2 标准是否支持 WaveFront?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42261692/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com