2016-11-08 194 views
0

我尝试使用OpenMP减少并行化下面的循环;使用Eigen :: VectorXd减少OpenMP OpenDMP:

#define EIGEN_DONT_PARALLELIZE 
#include <iostream> 
#include <cmath> 
#include <string> 
#include <eigen3/Eigen/Dense> 
#include <eigen3/Eigen/Eigenvalues> 
#include <omp.h> 

using namespace Eigen; 
using namespace std; 

VectorXd integrand(double E) 
{ 
    VectorXd answer(500000); 
    double f = 5.*E + 32.*E*E*E*E; 
    for (int j = 0; j !=50; j++) 
     answer[j] =j*f; 
    return answer; 
} 

int main() 
{ 
    omp_set_num_threads(4); 
    double start = 0.; 
    double end = 1.; 
    int n = 100; 
    double h = (end - start)/(2.*n); 

    VectorXd result(500000); 
    result.fill(0.); 
    double E = start; 
    result = integrand(E); 
    #pragma omp parallel 
    { 
    #pragma omp for nowait 
    for (int j = 1; j <= n; j++){ 
     E = start + (2*j - 1.)*h; 
     result = result + 4.*integrand(E); 
     if (j != n){ 
      E = start + 2*j*h; 
      result = result + 2.*integrand(E); 
     } 
    } 
    } 
    for (int i=0; i <50 ; ++i) 
     cout<< i+1 << " , "<< result[i] << endl; 

    return 0; 
} 

这绝对是平行比没有更快,但所有4个线程,结果是巨大的变化。当线程数设置为1时,输出是正确的。 我将不胜感激,如果有人可以帮助我这个...

我使用编译标志的铛编译器;

clang++-3.8 energy_integration.cpp -fopenmp=libiomp5 

如果这是一个半身像,然后我就必须要学会执行Boost::thread,或std::thread ...

+0

添加'FIRSTPRIVATE(PARAMS)减少(+:result_int)'你'parallel'指令,删除'critical',然后再试一次... – Gilles

+0

@Gilles感谢您的答复。我已编辑我的代码使第一个'#pragma'语句读取'#pragma omp parallel firstprivate(params)reduction(+:result_int)',第二个'#pragma'语句保持不变,并且所有后续的'#pragma'语句被删除。然后程序产生运行时错误:'.... const Eigen :: Matrix >:Assertion aLhs.rows()== aRhs.rows()&& aLhs.cols()== aRhs.cols()'失败。 Aborted' - 我可以保证kspace和result_int都具有相同数量的元素和维度 – AlexD

+1

您能否将您的示例变为完整[mcve]?另外,串行版本是否按预期工作? –

回答

2

您的代码不会定义自定义还原为OpenMP的减少征对象。我不确定clang是否支持用户定义的缩小(请参阅OpenMP 4 spec,第180页)。如果是这样,您可以声明减价并将reduction(+:result)添加到#pragma omp for一行。如果没有,你可以通过改变你的代码如下自己做:

VectorXd result(500000); // This is the final result, not used by the threads 
result.fill(0.); 
double E = start; 
result = integrand(E); 
#pragma omp parallel 
{ 
    // This is a private copy per thread. This resolves race conditions between threads 
    VectorXd resultPrivate(500000); 
    resultPrivate.fill(0.); 
#pragma omp for nowait// reduction(+:result) // Assuming user-defined reductions aren't allowed 
    for (int j = 1; j <= n; j++) { 
     E = start + (2 * j - 1.)*h; 
     resultPrivate = resultPrivate + 4.*integrand(E); 
     if (j != n) { 
      E = start + 2 * j*h; 
      resultPrivate = resultPrivate + 2.*integrand(E); 
     } 
    } 
#pragma omp critical 
    { 
     // Here we sum the results of each thread one at a time 
     result += resultPrivate; 
    } 
} 

你得到(in your comment)的错误似乎是由于大小不匹配。虽然代码本身并不重要,但不要忘记,当OpenMP启动每个线程时,它必须初始化每个线程的专用VectorXd。如果没有提供,则默认为VectorXd()(大小为零)。当使用这个对象时,发生尺寸不匹配。的omp declare reduction一个“正确”的使用将包括初始化部分:

#pragma omp declare reduction (+: VectorXd: omp_out=omp_out+omp_in)\ 
    initializer(omp_priv=VectorXd::Zero(omp_orig.size())) 

omp_priv是私有变量的名称。它由VectorXd::Zero(...)初始化。尺寸使用omp_orig指定。标准 (第182页,25-27行)定义此为:

The special identifier omp_orig can also appear in the initializer-clause and it will refer to the storage of the original variable to be reduced.

在我们的情况下(见下文完整的例子),这是result。所以result.size()是500000,私有变量被初始化为正确的大小。

#include <iostream> 
#include <string> 
#include <Eigen/Core> 
#include <omp.h> 

using namespace Eigen; 
using namespace std; 

VectorXd integrand(double E) 
{ 
    VectorXd answer(500000); 
    double f = 5.*E + 32.*E*E*E*E; 
    for (int j = 0; j != 50; j++) answer[j] = j*f; 
    return answer; 
} 

#pragma omp declare reduction (+: Eigen::VectorXd: omp_out=omp_out+omp_in)\ 
    initializer(omp_priv=VectorXd::Zero(omp_orig.size())) 

int main() 
{ 
    omp_set_num_threads(4); 
    double start = 0.; 
    double end = 1.; 
    int n = 100; 
    double h = (end - start)/(2.*n); 

    VectorXd result(500000); 
    result.fill(0.); 
    double E = start; 
    result = integrand(E); 

#pragma omp parallel for reduction(+:result) 
    for (int j = 1; j <= n; j++) { 
     E = start + (2 * j - 1.)*h; 
     result += (4.*integrand(E)).eval(); 
     if (j != n) { 
      E = start + 2 * j*h; 
      result += (2.*integrand(E)).eval(); 
     } 
    } 
    for (int i = 0; i < 50; ++i) 
     cout << i + 1 << " , " << result[i] << endl; 

    return 0; 
} 
+0

非常好,谢谢。这让我有2线程的速度增加了2.17倍。用户定义的减少对我来说并不奏效,但运行时错误让我怀疑它是否与Eigen而不是Clang相关。编辑只是用g ++试过这个,它甚至没有编译''结果''有'无效'类型' – AlexD

+0

发生了什么运行时错误?用什么代码?也可能是Eigen类型与omp不搭配,我没有尝试过。 –

+0

我认为你对Eigen和omp是正确的。我已经在用户定义的缩减发布的代码上对此进行了测试。即使只设置了一个线程,运行时错误也是一样的,摘录显示为输出太长:'[BinaryOp = Eigen :: internal :: scalar_sum_op ,Lhs = const Eigen :: Matrix ,Rhs = const Eigen :: CwiseUnaryOp ,const Eigen :: Matrix >] :Assertion \'aLhs.rows()== aRhs.rows()&& aLhs.cols()== aRhs.cols()'失败。我中止了'''''''''''''''''''''''''''我急于补充,您勾画的另一种方法是一种享受 – AlexD