2017-02-15 77 views
1

是否可以选择性地启用带有模板参数或运行时变量的openmp指令?在并行区域内选择性启用OpenMP for循环

this (all threads work on the same for loop). 
#pragma omp parallel 
{ 
    #pragma omp for 
    for (int i = 0; i < 10; ++i) { /*...*/ } 
} 
versus this (each thread works on its own for loop) 
#pragma omp parallel 
{ 
    for (int i = 0; i < 10; ++i) { /*...*/ } 
} 

更新(如果测试子句)

TEST.CPP:

#include <iostream> 
#include <omp.h> 

int main() { 
    bool var = true; 
    #pragma omp parallel 
    { 
     #pragma omp for if (var) 
     for (int i = 0; i < 4; ++i) { 
      std::cout << omp_get_thread_num() << "\n"; 
     } 
    } 
} 

错误消息(G ++ 6,使用g ++ TEST.CPP -fopenmp编译)

test.cpp: In function ‘int main()’: 
test.cpp:8:25: error: ‘if’ is not valid for ‘#pragma omp for’ 
     #pragma omp for if (var) 
         ^~ 
+1

'#pragma omp parallel if(variable)' –

+0

这两个版本都是并行的,大多数情况下我想选择启用'#pragma omp for line'。如果if子句可以和for子句一起工作,我会尝试查找。谢谢。 –

+0

它确实。 https://msdn.microsoft.com/en-us/library/5187hzke.aspx希望对所有编译器都是如此。 –

回答

0

工作分类。不知道是否有可能摆脱获取线程ID的条件。

#include <iostream> 
#include <omp.h> 
#include <sstream> 
#include <vector> 
int main() { 
    constexpr bool var = true; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    #pragma omp parallel if (var) 
    { 

     const int thread_id0 = omp_get_thread_num(); 
     #pragma omp parallel 
     { 
      int thread_id1; 
      if (var) { 
       thread_id1 = thread_id0; 
      } else { 
       thread_id1 = omp_get_thread_num(); 
      } 

      #pragma omp for 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id1] << i << ", "; 
      } 
     } 
    } 

    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
} 

输出(当var == true):

n_threads: 8 
thread 0: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 1: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 2: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 3: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 4: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 5: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 6: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 7: 0, 1, 2, 3, 4, 5, 6, 7, 

输出(当var == false):

n_threads: 8 
thread 0: 0, 
thread 1: 1, 
thread 2: 2, 
thread 3: 3, 
thread 4: 4, 
thread 5: 5, 
thread 6: 6, 
thread 7: 7, 
+0

这适用于clang和g ++。不知道有关intel编译器。 –

+0

如果启用嵌套并行操作,它将无法按预期工作。 –

0
#include <omp.h> 
#include <sstream> 
#include <vector> 
#include <iostream> 
int main() { 
    constexpr bool var = false; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    #pragma omp parallel 
    { 
     const int thread_id = omp_get_thread_num(); 
     if (var) { 
      #pragma omp for 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id] << i << ", "; 
      } 
     } else { 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id] << i << ", "; 
      } // code duplication 
     } 
    } 
    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
} 
+1

你意识到'else'块中的代码实际上创建了一个嵌套的并行区域,这可能会导致令人惊讶的结果?它可能看起来像OP一样工作的唯一原因是默认情况下,嵌套并行是禁用的,并且该区域将在每个线程中以串行方式执行。 –

+0

谢谢。我通过删除'else'块中的'#pragma omp parallel for'来解决这个问题。 –

+0

对不起,我没有意识到你是OP。你应该真的把你的答案结合成一个。 –

1

我认为惯用C++的解决方案是隐藏不同OpenMP编译后面算法重载。

#include <iostream> 
#include <sstream> 
#include <vector> 
#include <omp.h> 

#include <type_traits> 
template <bool ALL_PARALLEL> 
struct impl; 

template<> 
struct impl<true> 
{ 
    template<typename ITER, typename CALLABLE> 
    void operator()(ITER begin, ITER end, const CALLABLE& func) { 
    #pragma omp parallel 
    { 
     for (ITER i = begin; i != end; ++i) { 
     func(i); 
     } 
    } 
    } 
}; 

template<> 
struct impl<false> 
{ 
    template<typename ITER, typename CALLABLE> 
    void operator()(ITER begin, ITER end, const CALLABLE& func) { 
    #pragma omp parallel for 
    for (ITER i = begin; i < end; ++i) { 
     func(i); 
    } 
    } 
}; 

// This is just so we don't have to write parallel_foreach()(...) 
template <bool ALL_PARALLEL, typename ITER, typename CALLABLE> 
void parallel_foreach(ITER begin, ITER end, const CALLABLE& func) 
{ 
    impl<ALL_PARALLEL>()(begin, end, func); 
} 

int main() 
{ 
    constexpr bool var = false; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    parallel_foreach<var>(0, 8, [&s](auto i) { 
     s[omp_get_thread_num()] << i << ", "; 
    }); 

    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
} 

如果你使用一些特定的类型,可以按类型而不是使用bool模板参数做一个过载,并通过元素,而不是数字索引的循环迭代。请注意,您可以在OpenMP工作共享循环中使用C++随机访问迭代器!根据您的类型,您可能很好地实现了一个迭代器,它隐藏了调用者的内部数据访问的所有内容。

+0

我认为开销对于迭代器来说相当大:http://stackoverflow.com/questions/2513988/iteration-through-std-containers-in-openmp不知道现在是否仍然如此。阅读完之后,如果它用于openmp for循环,则避免为类编写迭代器。 –

+2

您误读了链接的答案。他给出的例子是'std :: set',它没有随机访问迭代器。因此,他不使用循环工作共享结构('#pragma omp(parallel)for'),而是使用手工循环。如果在随机访问迭代器上使用普通的'#pragma omp for',则没有固有开销。你的优化里程可能会有所不同,所以测量和比较。 – Zulan

+0

谢谢。猜测我会在下一个项目中添加随机访问迭代器... –