OpenMP边缘检测滤波器并行性：需要更长的时间

我想将sobel滤波器应用于大图像。OpenMP边缘检测滤波器并行性：需要更长的时间

我使用OpenMP做并行，以优化计算时间。

使用并行的优化后，我注意到，它需要比预期更长。下面是代码：

#include<iostream> 
#include<cmath> 
#include<opencv2/imgproc/imgproc.hpp> 
#include<opencv2/highgui/highgui.hpp> 

using namespace std; 
using namespace cv; 


// Computes the x component of the gradient vector 
// at a given point in a image. 
// returns gradient in the x direction 
int xGradient(Mat image, int x, int y) 
{ 
    return image.at<uchar>(y-1, x-1) + 
       2*image.at<uchar>(y, x-1) + 
       image.at<uchar>(y+1, x-1) - 
        image.at<uchar>(y-1, x+1) - 
        2*image.at<uchar>(y, x+1) - 
        image.at<uchar>(y+1, x+1); 
} 

// Computes the y component of the gradient vector 
// at a given point in a image 
// returns gradient in the y direction 

int yGradient(Mat image, int x, int y) 
{ 
    return image.at<uchar>(y-1, x-1) + 
       2*image.at<uchar>(y-1, x) + 
       image.at<uchar>(y-1, x+1) - 
        image.at<uchar>(y+1, x-1) - 
        2*image.at<uchar>(y+1, x) - 
        image.at<uchar>(y+1, x+1); 
} 




int main() 
{ 
const clock_t begin_time = clock(); 
     Mat src, dst; 
     int gx, gy, sum; 

     // Load an image 
     src = imread("/home/cgross/Downloads/pano.jpg", 0); 
     dst = src.clone(); 
     if(!src.data) 
     { return -1; } 

#pragma omp parallel for private(gx, gy, sum) shared(dst) 
     for(int y = 0; y < src.rows; y++) 
      for(int x = 0; x < src.cols; x++) 
       dst.at<uchar>(y,x) = 0.0; 

#pragma omp parallel for private(gx, gy, sum) shared(dst) 

     for(int y = 1; y < src.rows - 1; y++){ 

      for(int x = 1; x < src.cols - 1; x++){ 
       gx = xGradient(src, x, y); 
       gy = yGradient(src, x, y); 
       sum = abs(gx) + abs(gy); 
       sum = sum > 255 ? 255:sum; 
       sum = sum < 0 ? 0 : sum; 
       dst.at<uchar>(y,x) = sum; 
      } 
     } 

     namedWindow("final", WINDOW_NORMAL); 
     imshow("final", dst); 

     namedWindow("initial", WINDOW_NORMAL); 
     imshow("initial", src); 

std::cout << float(clock() - begin_time)/CLOCKS_PER_SEC<<endl; 
     waitKey(); 


    return 0; 
}

如果我注释掉编译（禁用的OpenMP），计算速度更快（10秒）时，我没有看到问题的所在。

来源

2014-09-26 trexgris

而不是使用'时钟（）的'为什么不试试'omp_get_wtime（）' – 2014-09-26 09:47:44

除非在Windows上运行，使用'时钟（）'来度量程序的性能几乎总是会给并行程序带来更糟糕的时序，并且还有其他[无数其他问题]（http://stackoverflow.com/search?q= [openmp] + clocks_per_sec）。 – 2014-09-26 10:49:32

通常，使用OpenMP时，小型环路不能很好地优化。 – 2014-09-26 15:09:55

比写自己Soebel相反，我会考虑使用两种。

1）内置在OpenCV中功能 http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=sobel#sobel

2）创建Soebel内核和使用的OpenCV filter2D（）函数。其他库，平台等具有类似的功能，用于将内核传递到图像上，并且许多已经被优化。例如，我认为iOS有一些叫做vImage的东西。

然后你可以将这些时间比较您的自定义代码。

你说你有一个“大”的形象，但是，这并不意味着多多少像素，我们在谈论什么？

您可以到部分分割图像并进行各过滤（用线等），然后将部分合并回做一个新的形象。我已经取得了很好的成功。

我也会这样说的：

http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-resources/programming-in-opencl/image-convolution-using-opencl/

来源

2014-09-26 16:49:08

OpenMP边缘检测滤波器并行性：需要更长的时间

回答

相关问题