内核崩溃时尝试做一个简单的值分配

我正在学习CUDA，并仍处于初级阶段。我正在尝试一个简单的任务，但我的代码崩溃时，我运行它，我不知道为什么。任何帮助，将不胜感激。内核崩溃时尝试做一个简单的值分配

编辑：崩溃上cudaMemcpy和Image结构中，pixelVal是int**类型。这是原因吗？

原始C++代码：

void Image::reflectImage(bool flag, Image& oldImage) 
/*Reflects the Image based on users input*/ 
{ 
    int rows = oldImage.N; 
    int cols = oldImage.M; 
    Image tempImage(oldImage); 

    for(int i = 0; i < rows; i++) 
    { 
     for(int j = 0; j < cols; j++) 
     tempImage.pixelVal[rows - (i + 1)][j] = oldImage.pixelVal[i][j]; 
    } 
    oldImage = tempImage; 
}

我的CUDA内核&代码：

#define NTPB 512 
__global__ void fliph(int* a, int* b, int r, int c) 
{ 
    int i = blockIdx.x * blockDim.x + threadIdx.x; 
    int j = blockIdx.y * blockDim.y + threadIdx.y; 

    if (i >= r || j >= c) 
     return; 
    a[(r - i * c) + j] = b[i * c + j]; 
} 
void Image::reflectImage(bool flag, Image& oldImage) 
/*Reflects the Image based on users input*/ 
{ 
    int rows = oldImage.N; 
    int cols = oldImage.M; 
    Image tempImage(oldImage); 
    if(flag == true) //horizontal reflection 
    { 
    //Allocate device memory 
    int* dpixels; 
    int* oldPixels; 
    int n = rows * cols; 
    cudaMalloc((void**)&dpixels, n * sizeof(int)); 
    cudaMalloc((void**)&oldPixels, n * sizeof(int)); 
    cudaMemcpy(dpixels, tempImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice); 
    cudaMemcpy(oldPixels, oldImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice); 
    int nblks = (n + NTPB - 1)/NTPB; 
    fliph<<<nblks, NTPB>>>(dpixels, oldPixels, rows, cols); 
    cudaMemcpy(tempImage.pixelVal, dpixels, n * sizeof(int), cudaMemcpyDeviceToHost); 
    cudaFree(dpixels); 
    cudaFree(oldPixels); 
    } 
    oldImage = tempImage; 
}

来源

2013-04-04 Bhrugesh Patel

您的块和网格是一维。你为什么在内核中使用二维索引。内核中的变量'j'始终为0。 – sgarizvi 2013-04-04 17:14:18

通过快速审查，代码看起来没有问题（除了@ sgar91笔记）。我建议您为程序提供错误检查以进一步说明问题。看[在]（http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api）这篇文章。 – stuhlo 2013-04-04 17:25:36

我计算了7个CUDA API调用，并且根本没有发现错误检查！第一步：检查错误并尝试缩小问题发生的位置。 – talonmies 2013-04-04 18:03:36

你必须按顺序使用2D指数i和j来处理图像以创建二维网格。在目前的情况下，内核只处理图像的第一行。

要创建一个2D的索引机制，创建二维块和2D网格是这样的：

const int BLOCK_DIM = 16; 

dim3 Block(BLOCK_DIM,BLOCK_DIM); 

dim3 Grid; 
Grid.x = (cols + Block.x - 1)/Block.x; 
Grid.y = (rows + Block.y - 1)/Block.y; 

fliph<<<Grid, Block>>>(dpixels, oldPixels, rows, cols);

来源

2013-04-04 18:03:42 sgarizvi

内核崩溃时尝试做一个简单的值分配

回答

相关问题