内核启动失败内核参数的原因

我做了一个简单的CUDA内核，由于某些原因无法启动，我不明白。下面你看到我的全球变量。内核启动失败内核参数的原因

unsigned int volume[256*256*256];//contains volume data of source 
unsigned int target[256*256*256];//contains volume data of target 
unsigned int* d_volume=NULL;//source data on device 
unsigned int* d_target=NULL;//target data on device

下一个函数是内核启动器。

void launch_kernel(){ 

cudaMalloc(&d_volume,256*256*256*sizeof(unsigned int)); 
cudaMemcpy(d_volume, volume, 256*256*256*sizeof(unsigned int),cudaMemcpyHostToDevice); 
cudaMalloc(&d_target,256*256*256*sizeof(unsigned int)); 
cudaMemcpy(d_target, target, 256*256*256*sizeof(unsigned int),cudaMemcpyHostToDevice); 
dim3 threads(256,1,1); 
dim3 blocks(256,256,1); 

simple_kernel<<<blocks,threads>>>(d_volume,d_target); 
cudaError_t cudaResult; 
cudaResult = cudaGetLastError(); 
if (cudaResult != cudaSuccess) 
{ 
    cout<<"kernel failed"<<endl; 
} 
cudaMemcpy(volume, d_volume, 256*256*256*sizeof(int),cudaMemcpyDeviceToHost); 
cudaFree(d_volume); 
cudaMemcpy(target, d_target 256*256*256*sizeof(int),cudaMemcpyDeviceToHost); 
cudaFree(d_target); 
}

问题似乎是d_target原因，如果我推出这样的内核：

simple_kernel<<<blocks,threads>>>(d_volume,d_volume);

它可以正常使用（传递到设备必须传递的值），并没有出现任何信息。任何想法为什么会发生？内核声明如下。

__global__ void simple_kernel(unsigned int* src,unsigned int* tgt){ 
//i dont think it matters what it is for. 
     int x = threadIdx.x; 
     int y = blockIdx.x; 
     int z = blockIdx.y; 
     if(x!=0 || x!=255 || y!=0 || y!=255 || z!=0 || z!=255 ){//in bound of memory allocated 
      if(src[x*256*256+y*256+z]==tgt[x*256*256+y*256+z]) 
       if(tgt[(x+1)*256*256+y*256+z]==1 || tgt[(x-1)*256*256+y*256+z]==1 || tgt[(x-1)*256*256+(y+1)*256+z] ||tgt[(x-1)*256*256+(y-1)*256+z]) 
        src[x*256*256+y*256+z]=1; 
       else 
        src[x*256*256+y*256+z]=0; 
     } 

    }

来源

2013-04-06 Philip Xenos

'cudaGetLastError（）'返回的错误代码是什么？ – stuhlo 2013-04-06 00:24:34

cout << cudaGetErrorString（cudaGetLastError（））<< endl;返回：“没有错误” – 2013-04-06 00:52:06

“'启动失败'是什么意思？ – stuhlo 2013-04-06 00:56:15

CUDA也可以在超出边界读取全局内存的情况下返回错误。您在：
if(tgt[(x+1)*256*256+y*256+z]==1 || ...)（例如）中执行这种越界读取访问。为x = y = z = 255哪些经历您的越界检查。

在一个情况下，你在出界外的读取权限启动您的内核
simple_kernel<<<blocks,threads>>>(d_volume,d_volume);
你真正访问已经被分配给d_target为数组d_volume和d_target连续存储全局内存，因此，没有错误发生。

通过进一步的错误检查或通过cuda-memcheck启动您的程序来确认我的意见。

来源

2013-04-06 00:55:27 stuhlo

这是问题所在。非常感谢你。 – 2013-04-06 01:07:25

@PhilipXenos不客气。 – stuhlo 2013-04-06 01:10:33

内核启动失败内核参数的原因

回答

相关问题