0
我正在学习CUDA,并仍处于初级阶段。我正在尝试一个简单的任务,但我的代码崩溃时,我运行它,我不知道为什么。任何帮助,将不胜感激。内核崩溃时尝试做一个简单的值分配
编辑:崩溃上cudaMemcpy
和Image
结构中,pixelVal
是int**
类型。这是原因吗?
原始C++代码:
void Image::reflectImage(bool flag, Image& oldImage)
/*Reflects the Image based on users input*/
{
int rows = oldImage.N;
int cols = oldImage.M;
Image tempImage(oldImage);
for(int i = 0; i < rows; i++)
{
for(int j = 0; j < cols; j++)
tempImage.pixelVal[rows - (i + 1)][j] = oldImage.pixelVal[i][j];
}
oldImage = tempImage;
}
我的CUDA内核&代码:
#define NTPB 512
__global__ void fliph(int* a, int* b, int r, int c)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i >= r || j >= c)
return;
a[(r - i * c) + j] = b[i * c + j];
}
void Image::reflectImage(bool flag, Image& oldImage)
/*Reflects the Image based on users input*/
{
int rows = oldImage.N;
int cols = oldImage.M;
Image tempImage(oldImage);
if(flag == true) //horizontal reflection
{
//Allocate device memory
int* dpixels;
int* oldPixels;
int n = rows * cols;
cudaMalloc((void**)&dpixels, n * sizeof(int));
cudaMalloc((void**)&oldPixels, n * sizeof(int));
cudaMemcpy(dpixels, tempImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(oldPixels, oldImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);
int nblks = (n + NTPB - 1)/NTPB;
fliph<<<nblks, NTPB>>>(dpixels, oldPixels, rows, cols);
cudaMemcpy(tempImage.pixelVal, dpixels, n * sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(dpixels);
cudaFree(oldPixels);
}
oldImage = tempImage;
}
您的块和网格是一维。你为什么在内核中使用二维索引。内核中的变量'j'始终为0。 – sgarizvi 2013-04-04 17:14:18
通过快速审查,代码看起来没有问题(除了@ sgar91笔记)。我建议您为程序提供错误检查以进一步说明问题。看[在](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api)这篇文章。 – stuhlo 2013-04-04 17:25:36
我计算了7个CUDA API调用,并且根本没有发现错误检查!第一步:检查错误并尝试缩小问题发生的位置。 – talonmies 2013-04-04 18:03:36