使用Eigen :: VectorXd减少OpenMP OpenDMP：

我尝试使用OpenMP减少并行化下面的循环;使用Eigen :: VectorXd减少OpenMP OpenDMP：

#define EIGEN_DONT_PARALLELIZE 
#include <iostream> 
#include <cmath> 
#include <string> 
#include <eigen3/Eigen/Dense> 
#include <eigen3/Eigen/Eigenvalues> 
#include <omp.h> 

using namespace Eigen; 
using namespace std; 

VectorXd integrand(double E) 
{ 
    VectorXd answer(500000); 
    double f = 5.*E + 32.*E*E*E*E; 
    for (int j = 0; j !=50; j++) 
     answer[j] =j*f; 
    return answer; 
} 

int main() 
{ 
    omp_set_num_threads(4); 
    double start = 0.; 
    double end = 1.; 
    int n = 100; 
    double h = (end - start)/(2.*n); 

    VectorXd result(500000); 
    result.fill(0.); 
    double E = start; 
    result = integrand(E); 
    #pragma omp parallel 
    { 
    #pragma omp for nowait 
    for (int j = 1; j <= n; j++){ 
     E = start + (2*j - 1.)*h; 
     result = result + 4.*integrand(E); 
     if (j != n){ 
      E = start + 2*j*h; 
      result = result + 2.*integrand(E); 
     } 
    } 
    } 
    for (int i=0; i <50 ; ++i) 
     cout<< i+1 << " , "<< result[i] << endl; 

    return 0; 
}

这绝对是平行比没有更快，但所有4个线程，结果是巨大的变化。当线程数设置为1时，输出是正确的。我将不胜感激，如果有人可以帮助我这个...

我使用编译标志的铛编译器;

clang++-3.8 energy_integration.cpp -fopenmp=libiomp5

如果这是一个半身像，然后我就必须要学会执行Boost::thread，或std::thread ...

来源

2016-11-08 AlexD

添加'FIRSTPRIVATE（PARAMS）减少（+：result_int）'你'parallel'指令，删除'critical'，然后再试一次... – Gilles

@Gilles感谢您的答复。我已编辑我的代码使第一个'#pragma'语句读取'#pragma omp parallel firstprivate（params）reduction（+：result_int）'，第二个'#pragma'语句保持不变，并且所有后续的'#pragma'语句被删除。然后程序产生运行时错误：'.... const Eigen :: Matrix >：Assertion aLhs.rows（）== aRhs.rows（）&& aLhs.cols（）== aRhs.cols（）'失败。 Aborted' - 我可以保证kspace和result_int都具有相同数量的元素和维度 – AlexD

您能否将您的示例变为完整[mcve]？另外，串行版本是否按预期工作？ –

您的代码不会定义自定义还原为OpenMP的减少征对象。我不确定clang是否支持用户定义的缩小（请参阅OpenMP 4 spec，第180页）。如果是这样，您可以声明减价并将reduction(+:result)添加到#pragma omp for一行。如果没有，你可以通过改变你的代码如下自己做：

VectorXd result(500000); // This is the final result, not used by the threads 
result.fill(0.); 
double E = start; 
result = integrand(E); 
#pragma omp parallel 
{ 
    // This is a private copy per thread. This resolves race conditions between threads 
    VectorXd resultPrivate(500000); 
    resultPrivate.fill(0.); 
#pragma omp for nowait// reduction(+:result) // Assuming user-defined reductions aren't allowed 
    for (int j = 1; j <= n; j++) { 
     E = start + (2 * j - 1.)*h; 
     resultPrivate = resultPrivate + 4.*integrand(E); 
     if (j != n) { 
      E = start + 2 * j*h; 
      resultPrivate = resultPrivate + 2.*integrand(E); 
     } 
    } 
#pragma omp critical 
    { 
     // Here we sum the results of each thread one at a time 
     result += resultPrivate; 
    } 
}

你得到（in your comment）的错误似乎是由于大小不匹配。虽然代码本身并不重要，但不要忘记，当OpenMP启动每个线程时，它必须初始化每个线程的专用VectorXd。如果没有提供，则默认为VectorXd()（大小为零）。当使用这个对象时，发生尺寸不匹配。的omp declare reduction一个“正确”的使用将包括初始化部分：

#pragma omp declare reduction (+: VectorXd: omp_out=omp_out+omp_in)\ 
    initializer(omp_priv=VectorXd::Zero(omp_orig.size()))

omp_priv是私有变量的名称。它由VectorXd::Zero(...)初始化。尺寸使用omp_orig指定。标准（第182页，25-27行）定义此为：

The special identifier omp_orig can also appear in the initializer-clause and it will refer to the storage of the original variable to be reduced.

在我们的情况下（见下文完整的例子），这是result。所以result.size()是500000，私有变量被初始化为正确的大小。

#include <iostream> 
#include <string> 
#include <Eigen/Core> 
#include <omp.h> 

using namespace Eigen; 
using namespace std; 

VectorXd integrand(double E) 
{ 
    VectorXd answer(500000); 
    double f = 5.*E + 32.*E*E*E*E; 
    for (int j = 0; j != 50; j++) answer[j] = j*f; 
    return answer; 
} 

#pragma omp declare reduction (+: Eigen::VectorXd: omp_out=omp_out+omp_in)\ 
    initializer(omp_priv=VectorXd::Zero(omp_orig.size())) 

int main() 
{ 
    omp_set_num_threads(4); 
    double start = 0.; 
    double end = 1.; 
    int n = 100; 
    double h = (end - start)/(2.*n); 

    VectorXd result(500000); 
    result.fill(0.); 
    double E = start; 
    result = integrand(E); 

#pragma omp parallel for reduction(+:result) 
    for (int j = 1; j <= n; j++) { 
     E = start + (2 * j - 1.)*h; 
     result += (4.*integrand(E)).eval(); 
     if (j != n) { 
      E = start + 2 * j*h; 
      result += (2.*integrand(E)).eval(); 
     } 
    } 
    for (int i = 0; i < 50; ++i) 
     cout << i + 1 << " , " << result[i] << endl; 

    return 0; 
}

来源

2016-11-14 20:40:05

非常好，谢谢。这让我有2线程的速度增加了2.17倍。用户定义的减少对我来说并不奏效，但运行时错误让我怀疑它是否与Eigen而不是Clang相关。编辑只是用g ++试过这个，它甚至没有编译''结果''有'无效'类型' – AlexD

发生了什么运行时错误？用什么代码？也可能是Eigen类型与omp不搭配，我没有尝试过。 –

我认为你对Eigen和omp是正确的。我已经在用户定义的缩减发布的代码上对此进行了测试。即使只设置了一个线程，运行时错误也是一样的，摘录显示为输出太长：'[BinaryOp = Eigen :: internal :: scalar_sum_op ，Lhs = const Eigen :: Matrix ，Rhs = const Eigen :: CwiseUnaryOp ，const Eigen :: Matrix >] ：Assertion \'aLhs.rows（）== aRhs.rows（）&& aLhs.cols（）== aRhs.cols（）'失败。我中止了'''''''''''''''''''''''''''我急于补充，您勾画的另一种方法是一种享受 – AlexD

使用Eigen :: VectorXd减少OpenMP OpenDMP：

回答

相关问题