1
下面的代码执行与1个螺纹比具有更好的慢2(使用4个线程给出加快,虽然):OpenMP中使用rand_r“为”是具有2个线程
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
int main(int argc, char **argv) {
int n = atoi(argv[1]);
int num_threads = atoi(argv[2]);
omp_set_num_threads(num_threads);
unsigned int *seeds = malloc(num_threads * sizeof(unsigned int));
for (int i = 0; i < num_threads; ++i) {
seeds[i] = 42 + i;
}
unsigned long long sum = 0;
double begin_time = omp_get_wtime();
#pragma omp parallel
{
unsigned int *seedp = &seeds[omp_get_thread_num()];
#pragma omp for reduction(+ : sum)
for (int i = 0; i < n; ++i) {
sum += rand_r(seedp);
}
}
double end_time = omp_get_wtime();
printf("%fs\n", end_time - begin_time);
free(seeds);
return EXIT_SUCCESS;
}
在我的笔记本电脑(2芯,支持HT)我得到以下结果:
$ gcc -fopenmp test.c && ./a.out 100000000 1
0.821497s
$ gcc -fopenmp test.c && ./a.out 100000000 2
1.096394s
$ gcc -fopenmp test.c && ./a.out 100000000 3
0.933494s
$ gcc -fopenmp test.c && ./a.out 100000000 4
0.748038s
的问题仍然存在,而不减少,drand48_r
带来了没有什么区别,动态调度使事情变得更糟。但是,如果我用没有连接随机的东西替换循环体,即一切正常,按预期工作。
啊!我省略了rand_r实际上修改了给定的参数。好答案 :) – Harald