gpt4 book ai didi

synchronization - OpenCL和GPU全局同步

转载 作者:行者123 更新时间:2023-12-02 01:30:36 25 4
gpt4 key购买 nike

是否有人尝试过“通过快速屏障同步进行块间GPU通信”一文中介绍的gpu_sync函数?所描述的所有代码似乎非常简单且易于实现,但它不断冻结我的GPU。我确定我在做一些愚蠢的事情,但看不到。谁能帮我?

我使用的策略是“GPU无锁同步”部分中描述的策略,这是我已实现的OpenCL源代码:

static void globalSync(uint iGoalValue,
volatile __global int *globalSyncFlagsIN,
volatile __global int *globalSyncFlagsOUT)
{
const size_t iLocalThreadID = get_local_id(0);
const size_t iWorkGroupID = get_group_id(0);
const size_t iWorkGroupCount = get_num_groups(0);

//Only the first thread on each SM is used for synchronization
if (iLocalThreadID == 0)
{ globalSyncFlagsIN[iWorkGroupID] = iGoalValue; }

if (iWorkGroupID == 0)
{
if (iLocalThreadID < iWorkGroupCount)
{
while (globalSyncFlagsIN[iLocalThreadID] != iGoalValue) {
// Nothing to do here
}
}

barrier(CLK_GLOBAL_MEM_FENCE);

if (iLocalThreadID < iWorkGroupCount)
{ globalSyncFlagsOUT[iLocalThreadID] = iGoalValue; }
}

if (iLocalThreadID == 0)
{
while (globalSyncFlagsOUT[iWorkGroupID] != iGoalValue) {
// Nothing to do here
}
}

barrier(CLK_GLOBAL_MEM_FENCE);
}

提前致谢。

最佳答案

我还没有尝试运行代码,但是上述文章中的代码从CUDA到OpenCL的直接转换是:

{  
int tid_in_blk = get_local_id(0) * get_local_size(1)
+ get_local_id(1);
int nBlockNum = get_num_groups(0) * get_num_groups(1);
int bid = get_group_id(0) * get_num_groups(1) + get_group_id(1);


if (tid_in_blk == 0) {
Arrayin[bid] = goalVal;
}

if (bid == 1) {
if (tid_in_blk < nBlockNum) {
while (Arrayin[tid_in_blk] != goalVal){

}
}
barrier(CLK_LOCAL_MEM_FENCE);

if (tid_in_blk < nBlockNum) {
Arrayout[tid_in_blk] = goalVal;
}
}

if (tid_in_blk == 0) {
while (Arrayout[bid] != goalVal) {

}
}
}

请注意线程和组ID以及使用本地内存屏障而不是全局内存屏障的区别。

关于synchronization - OpenCL和GPU全局同步,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34476631/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com