gpt4 book ai didi

opencl - 如何在 pyOpenCL 中传递向量数组

转载 作者:行者123 更新时间:2023-12-01 16:02:09 29 4
gpt4 key购买 nike

我正在将模拟移动到 pyOpenCL 中,但无法使我的数据访问正常工作。我正在尝试提供一维向量数组(嗯,实际上有几个,但我包含的示例只使用了一个)。

目前,几个向量被复制得很好,但是数据根本不是我提供的。

我想我以前没有在这里发过帖子,所以如果任何格式/演示有错误,我深表歉意。另外,我刚刚去掉了所有的模拟代码,所以我意识到这段代码目前实际上没有做任何事情,我只是想让缓冲区传递正确。

提前致谢。

内核(kertest.py):

step1 = """
#pragma OPENCL EXTENSION cl_amd_printf: enable
#define X xdim
#define Y ydim
__kernel void k1(__global float3 *spins,
__local float3 *tile)
{
ushort lid = 2 * get_local_id(0);
ushort group = 2 * get_group_id(0);
ushort num = get_num_groups(0);
int lim = X*Y*3;

for (ushort i = 0; i < lim; i++)
{
if (lid == 0 && group == 0)
{
printf("%f :: %d\\n", spins[i].x, i);
}
}
}"""

代码本身(gputest.py):
import kertest as k2D
import numpy as np
import pyopencl as cl

class GPU_MC2DSim():
def __init__(self, x, y):
self.x = x
self.y = y

if x >= y:
self.xdim = int(self.x)
self.ydim = int(self.y)
else:
self.xdim = int(self.y)
self.ydim = int(self.x)

if self.xdim % 2 != 0: self.xdim += 1

if self.ydim % 2 != 0: self.ydim += 1

self.M = np.ones((self.xdim*self.ydim, 3)).astype(np.float32)
self.M[:, 1] += 1.0
self.M[:, 2] += 2.0

print self.M

def simulate(self):
ctx = cl.create_some_context()
q = cl.CommandQueue(ctx)
mf = cl.mem_flags

#Pass buffer:
M_buf = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf = self.M)

#Insert kernel parameters:
params = {"xdim" : "%d" % (self.xdim),
"ydim" : "%d" % (self.ydim),
}
for name in params:
k2D.step1 = k2D.step1.replace(name, params[name])

#Compile kernel:
step1 = cl.Program(ctx, k2D.step1).build()

locmem = cl.LocalMemory(self.xdim*4*4)

step1.k1(q, ((self.xdim*self.ydim)/4,), (self.xdim/2,), M_buf, locmem).wait()
return None

xdim = 4
ydim = 4
sim = GPU_MC2DSim(xdim, ydim)
sim.simulate()

最佳答案

您将数据复制到设备的代码很好。但是,您的内核至少有两个问题:

  • float3根据 OpenCL 1.2 规范 6.1.5,值预计为 16 字节对齐:

    For 3-component vector data types, the size of the data type is 4 * sizeof(component). This means that a 3-component vector data type will be aligned to a 4 * sizeof(component) boundary. The vload3 and vstore3 built-in functions can be used to read and write, respectively, 3-component vector data types from an array of packed scalar data type.



    您上传到设备的值未正确对齐以供内核读取 float3直接取值。
  • 您的限额计算int lim = X*Y*3;稍微关闭。您已经在尝试从 float3 的数组中读取数据,所以 *3是多余的。

  • 这两个问题的解决方案很简单:如规范中所述,您应该使用 vload3 float 的数组加载s:
    #pragma OPENCL EXTENSION cl_amd_printf: enable
    #define X xdim
    #define Y ydim
    __kernel void k1(__global float *spins,
    __local float3 *tile)
    {
    ushort lid = 2 * get_local_id(0);
    ushort group = 2 * get_group_id(0);
    ushort num = get_num_groups(0);
    int lim = X*Y;

    for (ushort i = 0; i < lim; i++)
    {
    if (lid == 0 && group == 0)
    {
    float3 vec = vload3(i, spins);
    printf("(%f, %f, %f) :: %d\\n", vec.x, vec.y, vec.z, i);
    }
    }
    }

    关于opencl - 如何在 pyOpenCL 中传递向量数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11736811/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com