Openmp基本并行化

我已经写了一些使用OpenMP并行课程的并行c代码。Openmp基本并行化

继承人片段

#include <stdio.h> 
#include <time.h> 
#include <math.h> 

#define FALSE 0 
#define TRUE 1 

int count_primes_0(int); 
int count_primes_1(int); 
int count_primes_2(int); 

int main(int argc, char *argv[]){ 
    int n; 

    if (argc != 2){ 
     printf("Incorrect Invocation, use: \nq1 N"); 
     return 0; 
    } else { 
     n = atoi(argv[1]); 
    } 

    if (n < 0){ 
     printf("N cannot be negative"); 
     return 0; 
    } 

    printf("N = %d\n", n); 

    //omp_set_num_threads(1); 
    time_it(count_primes_0, n, "Method 0"); 
    time_it(count_primes_1, n, "Method 1"); 
    time_it(count_primes_2, n, "Method 2"); 

    return 0; 
} 

int is_prime(int n){ 
    for(int i = 2; i <= (int)(sqrt((double) n)); i++){ 
     if ((n % i) == 0){ 
      return FALSE; 
     } 
    } 

    return n > 1; 
} 

void time_it(int (*f)(int), int n, char *string){ 
    clock_t start_clock; 
    clock_t end_clock; 
    double calc_time; 
    int nprimes; 

    struct timeval start_val; 
    struct timeval end_val; 

    start_clock = clock(); 
    nprimes = (*f)(n); 
    end_clock = clock(); 
    calc_time = ((double)end_clock - (double)start_clock)/CLOCKS_PER_SEC; 
    printf("\tNumber of primes: %d \t Time taken: %fs\n\n", nprimes, calc_time); 
} 

// METHOD 0 
// Base Case no parallelization 
int count_primes_0(int n){ 
    int nprimes = 0; 

    for(int i = 1; i <= n; i++){ 
     if (is_prime(i)) { 
      nprimes++; 
     } 
    } 

    return nprimes; 
} 

//METHOD 1 
// Use only For and Critical Constructs 
int count_primes_1(int n){ 
    int nprimes = 0; 

    #pragma omp parallel for 
    for(int i = 1; i <= n; i++){ 
     if (is_prime(i)) { 
      #pragma omp critical 
      nprimes++; 
     } 
    } 

    return nprimes; 
} 

//METHOD 2 
// Use Reduction 
int count_primes_2(int n){ 
    int nprimes = 0; 

    #pragma omp parallel for reduction(+:nprimes) 
    for(int i = 1; i <= n; i++){ 
     if (is_prime(i)) { 
      nprimes++; 
     } 
    } 

    return nprimes; 
}

我现在面临的问题是，当我使用OMP_SET_NUM_THREADS（）少的线程我用更快我的功能运行 - 或者更接近基地的运行时间并行化的情况下

时间结果：方法0：这些一个8芯机

8个线程上运行0.07s;方法1：1.63s;方法2：1.4s

4主题：方法0：0.07s;方法1：0.16s;方法2：0.16s

2主题：方法0：0.07s;方法1：0.10;方法2：0.09

1主题：方法0：0.07s;方法1：0.08s;方法2：0.07s

我已经试过禁用优化，并使用不同的gcc版本没有区别

任何帮助表示赞赏。

编辑：在Linux中使用时钟返回'不正确的'时间，挂钟时间是我所需要的，所以使用ether omp_get_wtime（）或Linux函数timeit会产生正确的结果。

来源

2011-02-12 Ciaran Liedeman

您可以发布不同NUM_THREADS的时序结果？ – CharlesB 2011-02-12 17:24:08

你在多核机器上运行吗？这个代码将受到CPU限制（与内存绑定或IO绑定相对），所以如果多线程能够在问题上抛出更多内核，它将只会改进。 – 2011-02-12 17:29:46

你没有长时间运行你的实验，所以有可能你的OMP时代实际上是由产生和杀死线程所支配的。尝试运行整个事情1000次，并计时整个事情。 – 2011-02-12 17:40:06

我很惊讶你已经看到上述程序的任何成功。如果您查看clock（）的RedHat Linux手册页，您会发现它“返回程序使用的处理器时间的近似值”。放入OpenMP指令会导致更多开销，因此在运行OpenMP时应该会看到更多的总体处理器时间。你需要看的是过去的时间（或挂钟时间）。当你并行运行（并且你有一个可以从并行中受益的程序）时，你会看到流逝的时间减少。 OpenMP规范定义了一个例程（omp_get_wtime（））来提供这些信息。

更改程序中使用的时钟（）和omp_get_wtime（）报告：

$ a.out的1000000（1,000,000）

2处理器：

时钟（）：0.23 wtime（）： 0.23时钟（）：0.96 wtime（）：0.16时钟（）：0.59 wtime（）：0.09

4个处理器：

时钟（）：0.24 wtime（）：0.24时钟（）：0.97 wtime（）：0.16时钟（）：0.57 wtime（）：0.09

8个处理器：

时钟（）：0.24 wtime（）：0.24时钟（）：2.60 wtime（）：0.26时钟（）：0.64 wtime（）：0 。09

$ a.out的10000000（10,000,000）

2处理器：

时钟（）：6.07 wtime（）：6.07时钟（）：10.4 wtime（）：1.78时钟（）：11.3 wtime （）：1.65

4个处理器：

时钟（）：6.07 wtime（）：6.07时钟（）：11.5 wtime（）：1.71时钟（）：10.7 wtime（）：1.72

8处理器：

时钟（）：6.07 wtime（）：6.07时钟（）：9.92 wtime（）：1.83时钟（）：11.9 wtime（）：1.86

来源

2011-02-14 15:26:10 ejd

除非参数是私有的，否则OpenMP不会并行化其中包含函数调用的循环。解决方案是在循环中内联is_prime()。

来源

2011-02-12 17:30:14 CharlesB

Openmp基本并行化

回答

相关问题