gpt4 book ai didi

cuda - 将实值函数与 CUDA 集成的辛普森方法

转载 作者:行者123 更新时间:2023-12-01 00:21:52 25 4
gpt4 key购买 nike

我正在尝试通过 CUDA 中的 Simpson 方法进行代码集成。

这是辛普森法则的公式

enter image description here

其中 x_k = a + k*h

这是我的代码

    __device__ void initThreadBounds(int *n_start, int *n_end, int n, 
int totalBlocks, int blockWidth)
{
int threadId = blockWidth * blockIdx.x + threadIdx.x;
int nextThreadId = threadId + 1;

int threads = blockWidth * totalBlocks;

*n_start = (threadId * n)/ threads;
*n_end = (nextThreadId * n)/ threads;
}

__device__ float reg_func (float x)
{
return x;
}

typedef float (*p_func) (float);

__device__ p_func integrale_f = reg_func;

__device__ void integralSimpsonMethod(int totalBlocks, int totalThreads,
double a, double b, int n, float p_function(float), float* result)
{
*result = 0;

float h = (b - a)/n;
//*result = p_function(a)+p_function(a + h * n);
//parallel
int idx_start;
int idx_end;
initThreadBounds(&idx_start, &idx_end, n-1, totalBlocks, totalThreads);
//parallel_ends
for (int i = idx_start; i < idx_end; i+=2) {
*result += ( p_function(a + h*(i-1)) +
4 * p_function(a + h*(i)) +
p_function(a + h*(i+1)) ) * h/3;

}
}


__global__ void integralSimpson(int totalBlocks, int totalThreads, float* result)
{
float res = 0;

integralSimpsonMethod(totalBlocks, totalThreads, 0, 10, 1000, integrale_f, &res);
result[(blockIdx.x*totalThreads + threadIdx.x)] = res;

//printf ("Simpson method\n");
}


__host__ void inttest()
{

const int blocksNum = 32;
const int threadNum = 32;

float *device_resultf;
float host_resultf[threadNum*blocksNum]={0};


cudaMalloc((void**) &device_resultf, sizeof(float)*threadNum*blocksNum);
integralSimpson<<<blocksNum, threadNum>>>(blocksNum, threadNum, device_resultf);
cudaThreadSynchronize();

cudaMemcpy(host_resultf, device_resultf, sizeof(float) *threadNum*blocksNum,
cudaMemcpyDeviceToHost);

float sum = 0;
for (int i = 0; i != blocksNum*threadNum; ++i) {
sum += host_resultf[i];
// printf ("result in %i cell = %f \n", i, host_resultf[i]);
}
printf ("sum = %f \n", sum);
cudaFree(device_resultf);
}

int main(int argc, char* argv[])
{


inttest();


int i;
scanf ("%d",&i);

}

问题是:当 n 小于 100000 时,它会出错。对于从 010 的积​​分,结果是 ~99,但是当 n = 100000 或更大时它工作正常,结果是 ~50

怎么了,伙计们?

最佳答案

这里的基本问题是你不了解自己的算法。

您的 integralSimpsonMethod() 函数设计为每个线程在整数域中的每个子间隔至少采样 3 个正交点。因此,如果选择n使其小于内核调用线程数的四倍,难免每个子区间都会重叠,导致积分不正确。您需要确保代码检查并缩放线程数或 n,以便它们在计算积分时不会产生重叠。

如果您这样做不是为了 self 熏陶,那么我建议您查看辛普森法则的综合 版本。这更适合并行实现,如果实现得当,性能将大大提高。

关于cuda - 将实值函数与 CUDA 集成的辛普森方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16134188/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com