LockBits对于我的需求来说似乎太慢 - 替代方案？

我正在使用摄像头拍摄的1000万像素图像。LockBits对于我的需求来说似乎太慢 - 替代方案？

其目的是在矩阵（二维数组）中注册每个像素的灰度值。

我第一次使用GetPixel，但它花了25秒来做到这一点。现在我使用Lockbits，但它需要10秒，如果我不将结果保存在文本文件中，则需要3秒。

我的导师说他们不需要注册结果，但3秒仍然太慢。所以我在我的程序中做了什么错误，还是比我的应用程序中的Lockbits更快？

这里是我的代码：

public void ExtractMatrix() 
{ 
    Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp"); 

    int[,] GRAY = new int[3840, 2748]; //Matrix with "grayscales" in INTeger values 

    unsafe 
    { 
     //create an empty bitmap the same size as original 
     Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height); 

     //lock the original bitmap in memory 
     BitmapData originalData = bmpPicture.LockBits(
      new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height), 
      ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb); 

     //lock the new bitmap in memory 
     BitmapData newData = bmp.LockBits(
      new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height), 
      ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb); 

     //set the number of bytes per pixel 
     // here is set to 3 because I use an Image with 24bpp 
     int pixelSize = 3; 

     for (int y = 0; y < bmpPicture.Height; y++) 
     { 
      //get the data from the original image 
      byte* oRow = (byte*)originalData.Scan0 + (y * originalData.Stride); 

      //get the data from the new image 
      byte* nRow = (byte*)newData.Scan0 + (y * newData.Stride); 

      for (int x = 0; x < bmpPicture.Width; x++) 
      { 
       //create the grayscale version 
       byte grayScale = 
        (byte)((oRow[x * pixelSize] * .114) + //B 
        (oRow[x * pixelSize + 1] * .587) + //G 
        (oRow[x * pixelSize + 2] * .299)); //R 

       //set the new image's pixel to the grayscale version 
       // nRow[x * pixelSize] = grayScale; //B 
       // nRow[x * pixelSize + 1] = grayScale; //G 
       // nRow[x * pixelSize + 2] = grayScale; //R 

       GRAY[x, y] = (int)grayScale; 
      } 
     }

来源

2013-05-07 Elo Monval

你可能使用[TPL]加快这（http://msdn.microsoft.com/en-us/library/dd537608。 aspx）使for循环并行运行。 – 2013-05-07 11:14:33

是否将图像锁定为图像原始格式时指定的像素格式？ – CodesInChaos 2013-05-07 11:19:34

找到哪个部分很慢。你正在做1000万次迭代。如果在内部循环中可以进行优化，则可以获得很大的性能提升。 – CodeCaster 2013-05-07 11:21:23

我不知道为什么内for循环的第二部分被注释掉了，但如果你不需要，你正在做一些不必要的铸造。删除它可能会提高你的表现。

而且，leppie建议，你可以用单精度浮点数：

 for (int x = 0; x < bmpPicture.Width; x++) 
     { 
      //create the grayscale version 
      GRAY[x, y] = 
       (int)((oRow[x * pixelSize] * .114f) + //B 
       (oRow[x * pixelSize + 1] * .587f) + //G 
       (oRow[x * pixelSize + 2] * .299f)); //R 

     }

来源

2013-05-07 11:33:51 Rik

谢谢，这是一个很好的开始:) – 2013-05-07 11:35:43

所以你说铸造一个'int'会比铸造一个'byte'更快？ – leppie 2013-05-07 11:36:36

但它不是更快：/ – 2013-05-07 11:37:07

这里有一些更多的优化，可以帮助：

使用交错数组（[][]）;在.NET中，accessing them is faster than multidimensional;
将在循环内部使用的缓存属性。虽然this answer指出JIT会优化它，但我们不知道内部发生了什么;
Multiplication is (generally) slower than addition;

正如其他人所说，float比double,which applies to older processors（〜10 +年）更快。这里唯一有利的是你将它们用作常量，因此消耗更少的内存（尤其是因为有很多迭代）;

Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp"); 

// jagged instead of multidimensional 
int[][] GRAY = new int[3840][]; //Matrix with "grayscales" in INTeger values 
for (int i = 0, icnt = GRAY.Length; i < icnt; i++) 
    GRAY[i] = new int[2748]; 

unsafe 
{ 
    //create an empty bitmap the same size as original 
    Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height); 

    //lock the original bitmap in memory 
    BitmapData originalData = bmpPicture.LockBits(
     new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height), 
     ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb); 

    //lock the new bitmap in memory 
    BitmapData newData = bmp.LockBits(
     new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height), 
     ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb); 

    //set the number of bytes per pixel 
    // here is set to 3 because I use an Image with 24bpp 
    const int pixelSize = 3; // const because it doesn't change 
    // store Scan0 value for reuse...we don't know if BitmapData caches it internally, or recalculated it every time, or whatnot 
    int originalScan0 = originalData.Scan0; 
    int newScan0 = newData.Scan0; 
    // incrementing variables 
    int originalStride = originalData.Stride; 
    int newStride = newData.Stride; 
    // store certain properties, because accessing a variable is normally faster than a property (and we don't really know if the property recalculated anything internally) 
    int bmpwidth = bmpPicture.Width; 
    int bmpheight = bmpPicture.Height; 

    for (int y = 0; y < bmpheight; y++) 
    { 
     //get the data from the original image 
     byte* oRow = (byte*)originalScan0 + originalStride++; // by doing Variable++, you're saying "give me the value, then increment one" (Tip: DON'T add parenthesis around it!) 

     //get the data from the new image 
     byte* nRow = (byte*)newScan0 + newStride++; 

     int pixelPosition = 0; 
     for (int x = 0; x < bmpwidth; x++) 
     { 
      //create the grayscale version 
      byte grayScale = 
       (byte)((oRow[pixelPosition] * .114f) + //B 
       (oRow[pixelPosition + 1] * .587f) + //G 
       (oRow[pixelPosition + 2] * .299f)); //R 

      //set the new image's pixel to the grayscale version 
      // nRow[pixelPosition] = grayScale; //B 
      // nRow[pixelPosition + 1] = grayScale; //G 
      // nRow[pixelPosition + 2] = grayScale; //R 

      GRAY[x][y] = (int)grayScale; 

      pixelPosition += pixelSize; 
     } 
    }

来源

2013-05-07 12:27:27 Jesse

好的建议，但我认为你已经错过了代码的主要问题：它（意外）转置位图，如果天真地完成，这是一个非常缓存不友好的操作。 – Daniel 2013-05-07 13:21:17

@Daniel Yup，我注意到了这一点，但决定只关注使用现有代码进行优化。好点，但。 =） – Jesse 2013-05-07 14:18:28

您的代码可能不是最优的，但一个快速脱脂似乎表明，即使这个版本应该在几分之一秒内运行。这表明有一些其他问题：

你是否：

在编制发布模式？调试模式关闭各种优化
使用附加的调试器运行？如果您使用F5从Visual Studio运行，那么（使用默认的C＃键盘快捷键）将会附加调试器。这可能会显着减慢程序速度，特别是如果您启用了任何断点或intellitrace。
在某些有限的设备上运行？这听起来像你在PC上运行，但如果你不是，那么设备特定的限制可能是相关的。
I/O限制？虽然你谈论一台摄像机，但你的代码表明你正在处理文件系统。任何文件系统交互都可能成为一个瓶颈，特别是一旦联网磁盘，病毒扫描程序，物理盘片和碎片化发挥作用。10mp图像为30MB（如果未压缩的RGB没有alpha通道），并且根据文件系统的细节，读取/写入很容易花费3秒。

来源

2013-05-07 12:46:03

你可以尽量避免乘，增量建立与X * pixelSize指针初始值，改变你的代码如下：

for (int x = 0; x < bmpPicture.Width; x++) 
      {  
       int *p = x * pixelSize; 

       GRAY[x, y]= 
        (int)((oRow[*p] * .114) + //B 
        (oRow[*p++] * .587) + //G 
        (oRow[*p++] * .299)); //R 
      }

这会加速你的代码，但我不确定它会更快。

注意：这将加快代码只有迭代通过值类型的数组，如果oRow更改为其他类型将无法工作。

来源

2013-05-07 12:51:29

您的代码正在从行主表示转换为列主表示。在位图中，像素（x，y）后面是内存中的（x + 1，y）;但在您的GRAY阵列中，像素（x，y）后跟（x，y + 1）。

这会在写入时导致内存访问效率低下，因为每次写入都会触及不同的缓存行;如果图像足够大，最终会丢弃CPU缓存。如果您的图像尺寸是2的幂，则此功能尤其糟糕（请参阅Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?）。

按行优先顺序存储您的阵列，并尽可能避免无效的内存访问（将GRAY[x,y]替换为GRAY[y,x]）。

如果你真的需要它在列优先的顺序，看看在矩阵转置更多的缓存友好的算法（如A Cache Efficient Matrix Transpose Program?）

来源

2013-05-07 13:19:52 Daniel

我不明白你为什么说灰色寄存器（x，y），然后（x，y + 1）。第一个循环y = 0，x = 0，然后y = 0和x = 1等等...... – 2013-05-08 07:53:01

@EloMonval：我正在讨论元素存储在内存中的顺序。您的循环以不同于存储在内存中的顺序访问阵列，由于缓存使用率低，导致显着减速。 – Daniel 2013-05-08 10:29:45

下面是一个使用只有整数运算替代改造，这是稍有不同（由于四舍五入的因素），但没有任何东西你用肉眼注意到：（未测试）

byte grayScale = (byte)((
     (oRow[pixelPosition] * 29) + 
     (oRow[pixelPosition + 1] * 151) + 
     (oRow[pixelPosition + 2] * 105)) >> 8);

的比例因子约为旧的再乘以256，到底移除以256

来源

2013-05-07 13:22:29 harold

巨大的 optimation将通过使用1D array而不是2D array来实现。

所有其他的不会给你很高的加速......

来源

2013-05-08 06:26:52 WhileTrueSleep

LockBits对于我的需求来说似乎太慢 - 替代方案？

回答

相关问题