我使用英特尔螺纹积木

需要并行归并的最佳代码我需要并行归并的最佳代码在C++我使用英特尔螺纹积木

来源

2011-01-27 Nikita

TBB已经包括一种方法（平行快速排序），其是-however使用英特尔螺纹积木 - 实现相当糟糕（运行时至少是线性的，与处理器的数量无关）。

我的建议是你从现有的实现中进行端口并行合并排序。例如，使用OpenMP的gnu并行模式排序（包含在任何最近的gcc与源文件中）。只需用一些tbb并行代码替换所有#pragma omp即可。

来源

2011-03-29 19:33:47 eci

首先，让我说，根据我的经验，tbb :: parallel_sort（）相当有效，比我要发布的代码快一点（至少对于数千个元素我已经测试过）。

话虽如此，我认为下面的代码正是你正在寻找。变量应该是自我解释和文档中的代码应该解释其余的 -

这将需要进行并行化：

#include<tbb/parallel_invoke.h>

如果您选择使用并发:: parallel_invoke（），它可以工作得更快，那么包括此：

#include<ppl.h>

我建议这些设置 -

#define MIN_ELEMENTS_FOR_RECURSION   (50) 
#define MIN_ELEMENTS_FOR_PARALLEL_PROCESSING (100)

以下是调用的主要功能。参数是迭代器来启动和随机存取类的结束（例如，向量，双端队列等）和比较功能 -

template <typename T_it, typename T_it_dereferenced> 
void parallelMergeSort(T_it first, T_it last, bool (*firstLessThanSecond)(const T_it_dereferenced& a, const T_it_dereferenced& b)) 
{ 
    // create copy of container for extra space 
    std::vector<T_it_dereferenced> copy(first, last); 

    parallelMergeSortRecursive(first, last, copy.begin(), copy.end(), firstLessThanSecond); 
}

这是递归从parallelMergeSort（）为了所谓的各占一半排序 -

template <typename T_it, typename T_it_dereferenced> 
void parallelMergeSortRecursive(T_it source_first, T_it source_last, T_it copy_first, T_it copy_last, 
bool (*firstLessThanSecond)(const T_it_dereferenced& a, const T_it_dereferenced& b), int recursion_depth = 0) 
{ 
    // divide the array in two, and sort the two halves 

    long num_elements = source_last - source_first; 

    if (num_elements > MIN_ELEMENTS_FOR_RECURSION) 
    { 
     T_it source_middle = source_first + num_elements/2; 
     T_it copy_middle = copy_first + num_elements/2; 

     if (num_elements > MIN_ELEMENTS_FOR_PARALLEL_PROCESSING) 
     { 
      // Concurrency::parallel_invoke() may work faster 
      tbb::parallel_invoke(
       [=] { parallelMergeSortRecursive(source_first,  source_middle, copy_first, copy_middle, firstLessThanSecond, recursion_depth + 1); }, 
       [=] { parallelMergeSortRecursive(source_middle, source_last, copy_middle, copy_last,  firstLessThanSecond, recursion_depth + 1); } 
      ); 
     } 
     else // sort serially rather than in parallel 
     { 
      parallelMergeSortRecursive(source_first, source_middle, copy_first, copy_middle, firstLessThanSecond, recursion_depth + 1); 
      parallelMergeSortRecursive(source_middle, source_last, copy_middle, copy_last,  firstLessThanSecond, recursion_depth + 1); 
     } 

     // merge the two sorted halves 

     // we switch source <--> target with each level of recursion. 
     // at even recursion depths (including zero which is the root level) we assume the source is sorted and merge into the target 

     if (recursion_depth % 2 == 0) 
     { 
      merge(source_first, copy_first, copy_middle, copy_last, firstLessThanSecond); 
     } 
     else 
     { 
      merge(copy_first, source_first, source_middle, source_last, firstLessThanSecond); 
     } 
    } 
    else // very few elements remain to be sorted, stop the recursion and sort in place 
    { 
     if (recursion_depth % 2 == 0) 
     { 
      std::stable_sort(source_first, source_last, firstLessThanSecond); 
     } 
     else 
     { 
      std::stable_sort(copy_first, copy_last, firstLessThanSecond); 
     } 
    } 
}

这是从递归函数调用，以合并两半 -

template <typename T_it, typename T_it_dereferenced> 
void merge(T_it target_first, T_it source_first, T_it source_middle, T_it source_last, 
bool (*firstLessThanSecond)(const T_it_dereferenced& a, const T_it_dereferenced& b)) 
{ 
    // source is assumed to contain two sorted sequences (from first to middle and from middle to last) 

    T_it source_it1 = source_first; 
    T_it source_it2 = source_middle; 
    T_it target_it = target_first; 

    for (/* intentional */ ; source_it1 < source_middle && source_it2 < source_last ; ++target_it) 
    { 
     //if (source_container[i] < source_container[j]) 
     if ( firstLessThanSecond(*source_it1, *source_it2) ) 
     { 
      *target_it = *source_it1; 
      ++source_it1; 
     } 
     else 
     { 
      *target_it = *source_it2; 
      ++source_it2; 
     } 
    } 

    // insert remaining elements in non-completely-traversed-half into original container 
    // only one of these two whiles will execute since one of the conditions caused the previous while to stop 

    for (/* intentional */ ; source_it1 < source_middle ; ++target_it) 
    { 
     *target_it = *source_it1; 
     ++source_it1; 
    } 

    for (/* intentional */ ; source_it2 < source_last ; ++target_it) 
    { 
     *target_it = *source_it2; 
     ++source_it2; 
    } 
}

来源

2013-06-20 09:43:17 soundwave

我使用英特尔螺纹积木

回答

相关问题