gpt4 book ai didi

c - 为什么我的 OpenMP 代码性能比串行的差?

转载 作者:太空宇宙 更新时间:2023-11-04 08:05:00 26 4
gpt4 key购买 nike

我正在做一个简单的 Pi 计算,我将生成随机数并递增计数的循环并行化。串行(非 OpenMP)代码的性能优于 OpenMP 代码。这是我进行的一些测量。下面还提供了这两个代码。

编译串口代码为:gcc pi.c -O3

将 OpenMP 代码编译为:gcc pi_omp.c -O3 -fopenmp

可能是什么问题?

# Iterations = 60000000

Serial Time = 0.893912

OpenMP 1 Threads Time = 0.876654
OpenMP 2 Threads Time = 23.8537
OpenMP 4 Threads Time = 7.72415

序列号:

/* Program to compute Pi using Monte Carlo methods */
/* from: http://www.dartmouth.edu/~rc/classes/soft_dev/C_simple_ex.html */

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>
#include <sys/time.h>
#define SEED 35791246

int main(int argc, char* argv)
{
int niter=0;
double x,y;
int i;
long count=0; /* # of points in the 1st quadrant of unit circle */
double z;
double pi;

printf("Enter the number of iterations used to estimate pi: ");
scanf("%d",&niter);

/* initialize random numbers */
srand(SEED);
count=0;
struct timeval start, end;
gettimeofday(&start, NULL);
for ( i=0; i<niter; i++) {
x = (double)rand()/RAND_MAX;
y = (double)rand()/RAND_MAX;
z = x*x+y*y;
if (z<=1) count++;
}
pi=(double)count/niter*4;

gettimeofday(&end, NULL);
double t2 = end.tv_sec + (end.tv_usec/1000000.0);
double t1 = start.tv_sec + (start.tv_usec/1000000.0);

printf("Time: %lg\n", t2 - t1);

printf("# of trials= %d , estimate of pi is %lg \n",niter,pi);
return 0;
}

OpenMP 并行代码:

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <time.h>
#include <sys/time.h>
#define SEED 35791246
/*
from: http://www.dartmouth.edu/~rc/classes/soft_dev/C_simple_ex.html
*/
#define CHUNKSIZE 500
int main(int argc, char *argv[]) {

int chunk = CHUNKSIZE;
int niter=0;
double x,y;
int i;
long count=0; /* # of points in the 1st quadrant of unit circle */
double z;
double pi;

int nthreads, tid;

printf("Enter the number of iterations used to estimate pi: ");
scanf("%d",&niter);

/* initialize random numbers */
srand(SEED);
struct timeval start, end;

gettimeofday(&start, NULL);
#pragma omp parallel shared(chunk) private(tid,i,x,y,z) reduction(+:count)
{
/* Obtain and print thread id */
tid = omp_get_thread_num();
//printf("Hello World from thread = %d\n", tid);

/* Only master thread does this */
if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}

#pragma omp for schedule(dynamic,chunk)
for ( i=0; i<niter; i++) {
x = (double)rand()/RAND_MAX;
y = (double)rand()/RAND_MAX;
z = x*x+y*y;
if (z<=1) count++;
}
}

gettimeofday(&end, NULL);
double t2 = end.tv_sec + (end.tv_usec/1000000.0);
double t1 = start.tv_sec + (start.tv_usec/1000000.0);

printf("Time: %lg\n", t2 - t1);

pi=(double)count/niter*4;
printf("# of trials= %d, threads used: %d, estimate of pi is %lg \n",niter,nthreads, pi);
return 0;
}

最佳答案

rand() 不可重入。它要么无法正常工作、崩溃,要么一次只能从一个线程调用。像 glibc 这样的库通常会为遗留的不可重入函数序列化或使用 TLS,而不是让它们在多线程代码中使用时随机崩溃。

尝试重入形式,rand_r():

tid = omp_get_thread_num();
unsigned int seed = tid;
...
x = (double)rand_r(&seed)/RAND_MAX;

我想您会发现它要快得多。

请注意我是如何将种子设置为 tid 的。您可能会想,为什么不将种子初始化为 SEED?给定相同的种子,rand_r() 将产生相同的数字序列。如果每个线程都使用同一系列的伪随机数,那么它就失去了进行更多迭代的意义!您必须让每个线程使用不同的数字。

关于c - 为什么我的 OpenMP 代码性能比串行的差?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43442425/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com