OpenCV的GPU模糊缓慢

GPU：GeForce GTX 750OpenCV的GPU模糊缓慢

CPU：Intel i5-4440 3.10 GHz

下面是一个简单的C++代码，我跑。

#include <iostream> 
    #include "opencv2/highgui/highgui.hpp" 
    #include "opencv2\gpu\gpu.hpp" 

    int main(int argc, char** argv) { 
     cv::Mat img0 = cv::imread("IMG_0984.jpg", CV_LOAD_IMAGE_GRAYSCALE); // Size 3264 x 2448 
     cv::Mat img0Blurred; 

     cv::gpu::GpuMat gpuImg0(img0); 
     cv::gpu::GpuMat gpuImage0Blurred; 

     int64 tickCount; 

     for (int i = 0; i < 5; i++) 
     { 
      tickCount = cv::getTickCount(); 
      cv::blur(img0, img0Blurred, cv::Size(7, 7)); 
      std::cout << "CPU Blur " << (cv::getTickCount() - tickCount)/cv::getTickFrequency() << std::endl; 

      tickCount = cv::getTickCount(); 
      cv::gpu::blur(gpuImg0, gpuImage0Blurred, cv::Size(7, 7)); 
      std::cout << "GPU Blur " << (cv::getTickCount() - tickCount)/cv::getTickFrequency() << std::endl; 

     } 

     cv::gpu::DeviceInfo deviceInfo; 
     std::cout << "Device Info: "<< deviceInfo.name() << std::endl; 

     std::cin.get(); 

     return 0; 
    }

而作为一个结果，我通常得到的东西是这样的：

CPU Blur: 0.01 
GPU Blur: 1.7 
CPU Blur: 0.009 
GPU Blur: 0.012 
CPU Blur: 0.009 
GPU Blur: 0.013 
CPU Blur: 0.01 
GPU Blur: 0.012 
CPU Blur: 0.009 
GPU Blur: 0.013 

Device Info: GeForce GTX 750

所以在GPU第一操作需要一定的时间。

但是，GPU的其他部分怎么样呢？

为什么GPU不提供任何加速度。毕竟这是一个很大的图像（3264 x 2448）。并行任务很好，不是吗？

我的CPU很好，还是我的GPU不好？或者这是组件之间的某种通信问题？

来源

2015-07-20 ancajic

[相关]（http://stackoverflow.com/questions/15035907/why-cvgpugaussianblur-is-slower-than-cvgaussianblur） –

你使用opencv与IPP？ – Micka

不，我没有..... – ancajic

你的第一个gpu测量结果与其他测量结果很不相同，我也经历过同样的事情。第一次调用opencv内核（erode/dilate/etc ...）比其他的更长。在一个应用程序中，当我们初始化GPU内存时，我们首先调用了cv :: gpu :: XX以避免产生这种噪声。

我也看到cv :: gpu在没有cv :: gpu :: Stream参数的每次调用之后使用cudaDeviceSynchronize。这可能会很长，并导致您噪音的测量。然后opencv可能会为临时缓冲区分配内存以存储用于模糊图像的内核。

我没有在你的例子中看到gpuImage0Blurred的分配，你能确定你的目标图像在循环外正确分配，否则你也会测量这个矩阵的分配时间。

使用nvvp可以为您提供关于应用程序运行时发生的事情的线索，以删除不必要的操作。

编辑：

#include <iostream> 
#include "opencv2/highgui/highgui.hpp" 
#include "opencv2\gpu\gpu.hpp" 


int main(int argc, char** argv) { 
    cv::Mat img0 = cv::imread("IMG_0984.jpg", CV_LOAD_IMAGE_GRAYSCALE); // Size 3264 x 2448 
    cv::Mat img0Blurred; 


    cv::gpu::GpuMat gpuImg0; 
    cv::gpu::Stream stream; 
    stream.enqueueUpload(img0, gpuImg0); 
    stream.waitForCompletion(); 

    // allocates the matrix outside the loop 
    cv::gpu::GpuMat gpuImage0Blurred(gpuImg0.size(), gpuImg0.type()); 

    int64 tickCount; 

    for (int i = 0; i < 5; i++) 
    { 
     tickCount = cv::getTickCount(); 
     cv::blur(img0, img0Blurred, cv::Size(7, 7)); 
     std::cout << "CPU Blur " << (cv::getTickCount() - tickCount)/cv::getTickFrequency() << std::endl; 

     tickCount = cv::getTickCount(); 
     cv::gpu::blur(gpuImg0, gpuImage0Blurred, cv::Size(7, 7), cv::Point(-1, -1), stream); 
     // ensure operations are finished before measuring time spent doing operations 
     stream.WaitCompletion(); 
     std::cout << "GPU Blur " << (cv::getTickCount() - tickCount)/cv::getTickFrequency() << std::endl; 

    } 

    std::cin.get(); 

    return 0; 
}

是的，事实证明waitForCompletion使所有的差异。我收到相同的值就像开头：

CPU Blur: 0.01 
GPU Blur: 1.7 
CPU Blur: 0.009 
GPU Blur: 0.012 
CPU Blur: 0.009 
GPU Blur: 0.013 
CPU Blur: 0.01 
GPU Blur: 0.012 
CPU Blur: 0.009 
GPU Blur: 0.013

来源

2015-07-21 07:26:58 X3liF

这很酷，但现在我面临着一个不同的问题。我只是将模糊作为一个简单的基准。我其实想要并行化特征检测。所以，这是在我的下一个问题：http://stackoverflow.com/questions/31536735/fast-gpu-feature-detection-slow – ancajic

你仍然没有分配你的输出矩阵的循环之外，只是声明变量， CV :: gpu :: GpuMat gpuImage0Blurred（gpuImg0.size（），gpuImg0。type（））; 将在设备上进行分配，否则您的第一个模糊调用将分配此缓冲器 – X3liF

我已更新编辑，在测量时间之前在流上添加同步，因为您只会测量添加订单所花费的时间流，而不是花费的计算时间。 – X3liF

OpenCV的GPU模糊缓慢

回答

相关问题