CUDA：重新分配内存时无效的设备指针错误

在下面的代码中，我简单地从main调用函数foo两次。该函数只是执行设备内存分配，然后递增该指针。然后退出并返回主界面。CUDA：重新分配内存时无效的设备指针错误

第一次foo被称为内存被正确分配。但是，现在你可以在输出中看到的，当我再次调用foo，CUDA内存分配与错误无效的设备指针

失败我尝试了两种foo的调用之间使用的cudaThreadSynchronize（），但没有收获。为什么内存分配失败？

实际上错误被casued由于

matrixd + = 3;

因为如果我不这样做增量错误消失。
但是，为什么即使我使用cudaFree（）？

请帮助我理解这一点。

我的输出是这里

Calling foo for the first time 
Allocation of matrixd passed: 
I came back to main safely :-) 
I am going back to foo again :-) 
Allocation of matrixd failed, the reason is: invalid device pointer

我主要的（）在这里FOO（的

#include<stdio.h> 
#include <cstdlib> // malloc(), free() 
#include <iostream> // cout, stream 
#include <math.h> 
#include <ctime> // time(), clock() 
#include <bitset> 
bool foo(); 

/*************************************** 
Main method. 

****************************************/ 
int main() 
{ 

    // Perform one warm-up pass and validate 
    std::cout << "Calling foo for the first time"<<std::endl; 
    foo(); 
    std::cout << "I came back to main safely :-) "<<std::endl; 
    std::cout << "I am going back to foo again :-) "<<std::endl; 
    foo();  
    getchar(); 
    return 0; 
}

定义）是在这个文件：

#include <cuda.h> 
#include <cuda_runtime_api.h> 
#include <device_launch_parameters.h> 
#include <iostream> 

bool foo() 
{ 
    // Error return value 
    cudaError_t status; 
    // Number of bytes in the matrix. 
    int bytes = 9 *sizeof(float); 
     // Pointers to the device arrays 
    float *matrixd=NULL; 

    // Allocate memory on the device to store matrix 
    cudaMalloc((void**) &matrixd, bytes); 
    status = cudaGetLastError();    //To check the error 
    if (status != cudaSuccess) {      
     std::cout << "Allocation of matrixd failed, the reason is: " << cudaGetErrorString(status) << 
     std::endl; 
     cudaFree(matrixd);      //Free call for memory 
     return false; 
    } 

    std::cout << "Allocation of matrixd passed: "<<std::endl; 


    ////// Increment address 
    for (int i=0; i<3; i++){ 
     matrixd += 3; 
    } 

     // Free device memory 
    cudaFree(matrixd);  

    return true; 
}

更新

更好的错误检查。此外，我只将设备指针递增一次。这次我得到以下输出：

Calling foo for the first time 
Allocation of matrixd passed: 
Increamented the pointer and going to free cuda memory: 
GPUassert: invalid device pointer C:/Users/user/Desktop/Gauss/Gauss/GaussianElem 
inationGPU.cu 44

行号44是cudaFree（）。为什么它仍然失败？

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); } 
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true) 
{ 
    if (code != cudaSuccess) 
    { 
     fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line); 
     if (abort) exit(code); 
    } 
} 

// GPU function for direct method Gross Jorden method. 

bool foo() 
{ 

    // Error return value 
    cudaError_t status; 
    // Number of bytes in the matrix. 
    int bytes = 9 *sizeof(float); 
     // Pointers to the device arrays 
    float *matrixd=NULL; 

    // Allocate memory on the device to store each matrix 
    gpuErrchk(cudaMalloc((void**) &matrixd, bytes)); 
    //cudaMemset(outputMatrixd, 0, bytes); 

    std::cout << "Allocation of matrixd passed: "<<std::endl; 


    ////// Incerament address 

     matrixd += 1; 

     std::cout << "Increamented the pointer and going to free cuda memory: "<<std::endl; 

     // Free device memory 
    gpuErrchk(cudaFree(matrixd));  

    return true; 
}

来源

2016-10-03 user3891236

如果您检查'cudaFree'调用'的返回状态会怎么样？ – talonmies

@talonmies你是对的，只是检查，我用cudagetlasterror（），低于cudafree和是的它显示，它是失败的但又是为什么？ – user3891236

没错。所以你的问题基本上是由不完整的错误检查造成的。你可以看到如何正确地做到这一点[这里]（http://stackoverflow.com/q/14038589/681865）。内存分配不失败。 – talonmies

真正的问题是在此代码：

for (int i=0; i<3; i++){ 
    matrixd += 3; 
} 

// Free device memory 
cudaFree(matrixd);

你永远不分配matrixd+9，所以它传递给cudaFree是非法的，并产生一个无效的设备指针错误。该错误正在传播到下次您执行错误检查时，这是在后续调用cudaMalloc之后。如果您阅读任何这些API调用的文档，您将注意到有警告说他们可以返回以前GPU操作的错误。这就是在这种情况下发生的事情。

CUDA运行时API中的错误检查可以很精确地执行。有一个强大的，准备好的食谱，如何做到这一点here。我建议你使用它。

来源

2016-10-03 05:56:28 talonmies

您的错误检查方式非常整齐。请参阅我的更新。我想我的错误是我正在尝试增加主机函数内的设备指针。我想这是不允许的，免费的cuda对此并不满意。事实上，在主机功能矩阵++会指向一些垃圾在主机不在设备内存.. – user3891236

@ user3891236：我告诉你到底是什么问题。你不能释放你没有分配的地址。 “增加”指针是完全可以的（尽管在这种情况下完全没有意义）。但是要求API释放递增的指针是非法的，因为API从未以该指针值分配内存。 – talonmies

非常感谢您清除我的疑惑。今天我学到了很多东西，包括检查CUDA错误的重要性。 – user3891236

CUDA：重新分配内存时无效的设备指针错误

回答

相关问题