multithreading - 使用clEnqueueMapBuffer和 'querying whether the command has finished'的OpenCL主机的内存可见性-6ren

multithreading - 使用clEnqueueMapBuffer和 'querying whether the command has finished'的OpenCL主机的内存可见性

转载作者：行者123 更新时间：2023-12-03 13:03:39

36

4

OpenCL 1.1标准说(5.2.3):

If blocking_map is CL_FALSE i.e. map operation is non-blocking, the pointer to the mapped region returned by clEnqueueMapBuffer cannot be used until the map command has completed. The event argument returns an event object which can be used to query the execution status of the map command. When the map command is completed, the application can access the contents of the mapped region using the pointer returned by clEnqueueMapBuffer.

但是在(5.9，紧随表5.15之后)有以下语句:

Using clGetEventInfo to determine if a command identified by event has finished execution (i.e. CL_EVENT_COMMAND_EXECUTION_STATUS returns CL_COMPLETE) is not a synchronization point. There are no guarantees that the memory objects being modified by command associated with event will be visible to other enqueued commands.

Q1 :所以，我想知道是否还有其他方法可以“查询执行”
映射命令的状态”以及查询返回“CL_COMPLETE”时是否隔离了内存一致性(在这种情况下，是针对主机)？
Q2 :我缺少什么吗？
Q3 :针对这种情况的典型OpenCL习惯用法是什么？

最佳答案

1-使用入队障碍并从该命令获取事件以具有可见性并与主机进行细粒度的同步

等待它在while循环中查询会使用更多的cpu，但至少具有良好的粒度

2个事件，用于细粒度控制。等待和可见性的障碍

例如，clwaitforevents既提供了查询结果，又使用了更少的cpu，但比查询的粒度更大

设备端仅使用事件网络在队列之间具有图形

3-没有任何典型的。选择哪个最适合您的问题

关于multithreading - 使用clEnqueueMapBuffer和 'querying whether the command has finished'的OpenCL主机的内存可见性，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42760740/

36

4

0

文章推荐： python - 内存分配失败: growing buffer - Python

文章推荐： multithreading - 如何使函数在MATLAB中无阻塞？

文章推荐： multithreading - 避免在构建 Boost 1.63.0 期间失败 - 线程

opencl - clEnqueueMapBuffer 是如何工作的
谁能说说功能clEnqueueMapBuffer工作机制。其实我主要关心的是我可以从这个函数中获得什么速度上的好处，而不是clEnqueueRead/WriteBuffer . PS : 是否clEn
c - 第一次 clEnqueueMapBuffer 调用需要很多时间
我在为 OpenCL 代码采用 YOLO 时遇到性能问题。该方法仅从设备中提取数据，第一次运行速度较慢，但接下来的几次调用速度很快。有调用日志，时间以微秒为单位: clEnqueueMapBuf
c - 为什么 clEnqueueMapBuffer 返回随机数据？
我试图通过编写一个内核来了解 OpenCL 中的 clEnqueueMapBuffer，该内核在输入缓冲区中找到值的平方，但使用 clEnqueueMapBuffer 在输出缓冲区中一次只返回两个项目
buffer - clEnqueueMapBuffer 和 clEnqueueWriteBuffer 有什么区别
它们都可以将数据从主机传输到设备，对吧？那么，有什么区别呢？需要创建一个缓冲区而不需要吗？谢谢! khronos网站上的解释: clEnqueueMapBuffer: 将命令加入队列，将 buffer
c - 与 CL_MEM_USE_HOST_PTR 缓冲区一起使用的 clEnqueueMapBuffer 是否返回相同的主机指针？
我的理解是，如果我使用 clCreateBuffer 使用标志 CL_MEM_USE_HOST_PTR 设置一个 cl_mem 对象，那 block 内存现在位于控制设备。如果我想以某种方式改变主机

首页

博学

6Ren·AI

商城

multithreading - 使用clEnqueueMapBuffer和 'querying whether the command has finished'的OpenCL主机的内存可见性