2012-08-05 63 views
1

我已经实现了使用GLSL自旋锁的深度剥离算法(受this的启发)。在下面的可视化中,请注意深度剥离算法的正确运行方式(第一层左上,第二层右上,第三层左下,第四层右下)。四个深度图层存储在一个RGBA纹理中。GLSL SpinLock only Mostly Works

不幸的是,自旋锁有时不能防止错误 - 你可以看到很少的白色斑点,特别是在第四层。第二层的太空船也有一个。这些斑点每帧都有所不同。

enter image description here

以我GLSL自旋锁,当一个片段是要绘制,所述片段程序读取和原子写锁定值到一个单独的锁定纹理,等待直到一个0出现时,指示该锁打开。 In practice,我发现程序必须是并行的,因为如果两个线程在同一像素上,则warp不能继续(一个必须等​​待,另一个线程继续,并且GPU线程扭曲中的所有线程必须同时执行)。

我的片断程序看起来像这样(注释和补充间距):

#version 420 core 

//locking texture 
layout(r32ui) coherent uniform uimage2D img2D_0; 
//data texture, also render target 
layout(RGBA32F) coherent uniform image2D img2D_1; 

//Inserts "new_data" into "data", a sorted list 
vec4 insert(vec4 data, float new_data) { 
    if  (new_data<data.x) return vec4(  new_data,data.xyz); 
    else if (new_data<data.y) return vec4(data.x,new_data,data.yz); 
    else if (new_data<data.z) return vec4(data.xy,new_data,data.z); 
    else if (new_data<data.w) return vec4(data.xyz,new_data  ); 
    else      return data; 
} 

void main() { 
    ivec2 coord = ivec2(gl_FragCoord.xy); 

    //The idea here is to keep looping over a pixel until a value is written. 
    //By looping over the entire logic, threads in the same warp aren't stalled 
    //by other waiting threads. The first imageAtomicExchange call sets the 
    //locking value to 1. If the locking value was already 1, then someone 
    //else has the lock, and can_write is false. If the locking value was 0, 
    //then the lock is free, and can_write is true. The depth is then read, 
    //the new value inserted, but only written if can_write is true (the 
    //locking texture was free). The second imageAtomicExchange call resets 
    //the lock back to 0. 

    bool have_written = false; 
    while (!have_written) { 
     bool can_write = (imageAtomicExchange(img2D_0,coord,1u) != 1u); 

     memoryBarrier(); 

     vec4 depths = imageLoad(img2D_1,coord); 
     depths = insert(depths,gl_FragCoord.z); 

     if (can_write) { 
      imageStore(img2D_1,coord,depths); 
      have_written = true; 
     } 

     memoryBarrier(); 

     imageAtomicExchange(img2D_0,coord,0); 

     memoryBarrier(); 
    } 
    discard; //Already wrote to render target with imageStore 
} 

我的问题是,为什么会出现这种斑点的行为呢?我想让螺旋锁在100%的时间内工作!它可能与我的memoryBrier()的位置有关吗?

回答

2

“imageAtomicExchange(img2D_0,coord,0);”需要在if语句中,因为即使对于没有它的线程,它也会重置锁变量!改变它可以修复它。

+0

最终片段着色器是什么样的?它是否还有memoryBarrier()操作? – ragnar 2013-02-12 18:55:05

+0

是的,但在更简洁的位置。 IIRC(它是程序生成的),它们仅在imageAtomicExchange和imageAtomicExchange之后。 – imallett 2013-02-12 21:35:06

+0

我实际上错误地认为它解决了这个问题。我在这里做了一个更完整的列表:http://stackoverflow.com/questions/21538555/broken-glsl-spinlock-glsl-locks-compendium – imallett 2014-02-03 21:53:29

3

作为参考,这里是锁定的代码,已经测试在GTX670上的Nvidia驱动程序314.22 & 320.18上工作。请注意,如果将代码重新排序或重写为逻辑上等效的代码,则会触发现有的编译器优化错误(请参阅下面的注释)。下面的注释使用无图像引用。

// sem is initialized to zero 
coherent uniform layout(size1x32) uimage2D sem; 

void main(void) 
{ 
    ivec2 coord = ivec2(gl_FragCoord.xy); 

    bool done = false; 
    uint locked = 0; 
    while(!done) 
    { 
    // locked = imageAtomicCompSwap(sem, coord, 0u, 1u); will NOT work 
     locked = imageAtomicExchange(sem, coord, 1u); 
     if (locked == 0) 
     { 
      performYourCriticalSection(); 

      memoryBarrier(); 

      imageAtomicExchange(sem, coord, 0u); 

      // replacing this with a break will NOT work 
      done = true; 
     } 
    } 

    discard; 
}