gpt4 book ai didi

c++ - 更新指令 OpenACC

转载 作者:搜寻专家 更新时间:2023-10-31 02:13:15 26 4
gpt4 key购买 nike

当我在顶层循环中有一个内核时,为什么我不能使用这 2 个指令:

#pragma acc update device(hbias[0:n_hidden],W[0:n_hidden][0:n_visible])   
#pragma acc update device(vbias[0:n_visible)

我需要在下面的代码中更新这些变量 hbiasvbiasW,但它不起作用:

void RBM::contrastive_divergence(int train_X[6][6], double learning_rate, int k) {
double r= rand() / (RAND_MAX + 1.0);

int * input = new int[n_visible];
double *ph_mean = new double[n_hidden];
int *ph_sample = new int[n_hidden];
double *nv_means = new double[n_visible];
int *nv_samples = new int[n_visible];
double *nh_means = new double[n_hidden];
int *nh_samples = new int[n_hidden];

#pragma acc kernels
for (int i = 0; i<train_N; i++) {


for (int j = 0; j< n_visible; j++){
input[j] = train_X[i][j];
}


sample_h_given_v(input, ph_mean, ph_sample,r);

for (int step = 0; step<k; step++) {
if (step == 0) {
gibbs_hvh(ph_sample, nv_means, nv_samples, nh_means, nh_samples,r);
}
else {
gibbs_hvh(nh_samples, nv_means, nv_samples, nh_means, nh_samples,r);
}
}


for (int i = 0; i<n_hidden; i++) {
for (int j = 0; j<n_visible; j++) {

W[i][j] += learning_rate * (ph_mean[i] * input[j] - nh_means[i] * nv_samples[j]) / N;

}
hbias[i] += learning_rate * (ph_sample[i] - nh_means[i]) / N;

}
//this directive
#pragma acc update device(hbias[0:n_hidden],W[0:n_hidden][0:n_visible])


for (int i = 0; i<n_visible; i++) {
vbias[i] += learning_rate * (input[i] - nv_samples[i]) / N;
}
//and this directive
#pragma acc update device(vbias[0:n_visible)
}

delete[] input;
delete[] ph_mean;
delete[] ph_sample;
delete[] nv_means;
delete[] nv_samples;
delete[] nh_means;
delete[] nh_samples;
}

但是当我有许多独立的内核在每个嵌套循环上工作时,我可以更新变量:

   void RBM::contrastive_divergence(int train_X[6][6], double learning_rate, int k) {
double r= rand() / (RAND_MAX + 1.0);

int * input = new int[n_visible];
double *ph_mean = new double[n_hidden];
int *ph_sample = new int[n_hidden];
double *nv_means = new double[n_visible];
int *nv_samples = new int[n_visible];
double *nh_means = new double[n_hidden];
int *nh_samples = new int[n_hidden];


for (int i = 0; i<train_N; i++) {

#pragma acc kernels
for (int j = 0; j< n_visible; j++){
input[j] = train_X[i][j];
}


sample_h_given_v(input, ph_mean, ph_sample,r);
#pragma acc kernels
for (int step = 0; step<k; step++) {
if (step == 0) {
gibbs_hvh(ph_sample, nv_means, nv_samples, nh_means, nh_samples,r);
}
else {
gibbs_hvh(nh_samples, nv_means, nv_samples, nh_means, nh_samples,r);
}
}

#pragma acc kernels
{
for (int i = 0; i<unhidden; i++) {
for (int j = 0; j<n_visible; j++) {

W[i][j] += learning_rate * (ph_mean[i] * input[j] - nh_means[i] * nv_samples[j]) / N;

}
hbias[i] += learning_rate * (ph_sample[i] - nh_means[i]) / N;

}
//this directive
#pragma acc update device(hbias[0:n_hidden],W[0:n_hidden][0:n_visible])
}


#pragma acc kernels
{
for (int i = 0; i<n_visible; i++) {
vbias[i] += learning_rate * (input[i] - nv_samples[i]) / N;
}

//and this directive
#pragma acc update device(vbias[0:n_visible)
}
}

delete[] input;
delete[] ph_mean;
delete[] ph_sample;
delete[] nv_means;
delete[] nv_samples;
delete[] nh_means;
delete[] nh_samples;
}

最佳答案

“更新”指令只能在主机代码中使用,因为数据移动必须从主机启动。您不能将它们放在计算区域内。

这段代码有很多问题。首先,对于嵌套循环使用相同的索引变量(在本例中为“i”)可能是不好的做法。尽管范围规则允许这样做,但很难判断代码应该使用哪个“i”。

外部的“i”循环可能无法并行化,因此您不应将“kernels”指令放在该循环之外。也许如果您将“输入”数组私有(private)化,然后在更新 vbias、hbias、W 数组时使用原子,它可能会工作,但您的性能会很差。 (您还需要确定其他数组是否需要私有(private)化或者是全局的,因此需要原子操作)。

我的建议是首先将“#pragma acc parallel loop”放在内部循环周围,一次一个。在继续下一个之前,请确保每个都有效。此外,我非常怀疑“step”循环是可并行化的,因此您很可能需要并行化“gibbs_hvh”子例程内的循环。

由于您使用的是 CUDA 统一内存 (-ta=tesla:managed),因此可能不需要添加数据区域。但是,如果您计划将来不使用托管内存,下一步将是在外部“i”循环周围添加数据指令(或在程序的更高位置,然后使用更新指令在外部“”之后同步数据我“循环)。

关于c++ - 更新指令 OpenACC,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41665671/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com