2016-11-24 61 views
5

我正在使用Swift和Metal在GPU上进行图像处理的macOS项目。上周,我收到了我新的15英寸MacBook Pro(2016年末),并注意到我的代码有些奇怪:应该写入纹理的内核似乎没有这样做...金属内核在新MacBook Pro(2016年末)GPU上运行不正常

经过大量挖掘,我发现问题与金属(AMD Radeon Pro 455或英特尔(R)HD Graphics 530)使用哪种GPU进行计算有关。

初始化MTLDevice使用MTLCopyAllDevices()返回表示的Radeon和Intel的GPU(而MTLCreateSystemDefaultDevice()返回默认装置,该装置是Radeon)器件的阵列。在任何情况下,代码都可以像英特尔GPU一样按预期工作,但Radeon GPU并非如此。

让我给你看一个例子。

开始,这里是一个简单的内核,其采用输入质地和复制它的颜色输出质地:

kernel void passthrough(texture2d<uint, access::read> inTexture [[texture(0)]], 
          texture2d<uint, access::write> outTexture [[texture(1)]], 
          uint2 gid [[thread_position_in_grid]]) 
    { 
     uint4 out = inTexture.read(gid); 
     outTexture.write(out, gid); 
    } 

我为了使用这个内核,我使用这段代码:

let devices = MTLCopyAllDevices() 
    for device in devices { 
     print(device.name!) // [0] -> "AMD Radeon Pro 455", [1] -> "Intel(R) HD Graphics 530" 
    } 

    let device = devices[0] 
    let library = device.newDefaultLibrary() 
    let commandQueue = device.makeCommandQueue() 

    let passthroughKernelFunction = library!.makeFunction(name: "passthrough") 

    let cps = try! device.makeComputePipelineState(function: passthroughKernelFunction!) 

    let commandBuffer = commandQueue.makeCommandBuffer() 
    let commandEncoder = commandBuffer.makeComputeCommandEncoder() 

    commandEncoder.setComputePipelineState(cps) 

    // Texture setup 
    let width = 16 
    let height = 16 
    let byteCount = height*width*4 
    let bytesPerRow = width*4 
    let region = MTLRegionMake2D(0, 0, width, height) 
    let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Uint, width: width, height: height, mipmapped: false) 

    // inTexture 
    var inData = [UInt8](repeating: 255, count: Int(byteCount)) 
    let inTexture = device.makeTexture(descriptor: textureDescriptor) 
    inTexture.replace(region: region, mipmapLevel: 0, withBytes: &inData, bytesPerRow: bytesPerRow) 

    // outTexture 
    var outData = [UInt8](repeating: 128, count: Int(byteCount)) 
    let outTexture = device.makeTexture(descriptor: textureDescriptor) 
    outTexture.replace(region: region, mipmapLevel: 0, withBytes: &outData, bytesPerRow: bytesPerRow) 

    commandEncoder.setTexture(inTexture, at: 0) 
    commandEncoder.setTexture(outTexture, at: 1) 
    commandEncoder.dispatchThreadgroups(MTLSize(width: 1,height: 1,depth: 1), threadsPerThreadgroup: MTLSize(width: width, height: height, depth: 1)) 

    commandEncoder.endEncoding() 
    commandBuffer.commit() 
    commandBuffer.waitUntilCompleted() 

    // Get the data back from the GPU 
    outTexture.getBytes(&outData, bytesPerRow: bytesPerRow, from: region , mipmapLevel: 0) 

    // Validation 
    // outData should be exactly the same as inData 
    for (i,outElement) in outData.enumerated() { 
     if outElement != inData[i] { 
      print("Dest: \(outElement) != Src: \(inData[i]) at \(i))") 
     } 
    } 

运行此代码let device = devices[0](的Radeon GPU),outTexture不会被写入(我的猜想),结果outData保持不变。另一方面,当使用let device = devices[1](Intel GPU)运行此代码时,所有内容均按预期运行,outData将使用inData中的值进行更新。

回答

8

我认为,无论何时GPU写入MTLStorageModeManaged资源(如纹理),然后您想从CPU读取该资源(例如使用getBytes()),都需要使用blit编码器对其进行同步。尝试把上面的commandBuffer.commit()行:

let blitEncoder = commandBuffer.makeBlitCommandEncoder() 
blitEncoder.synchronize(outTexture) 
blitEncoder.endEncoding() 

你可以摆脱没有这个上集成GPU,因为GPU使用系统内存资源,并没有什么可同步。

+0

哇,这是失踪的一块,非常感谢你!过去几个月,我一直在努力学习Swift和Metal,并且我不能说这很容易。 –

相关问题