Cuda的函数指针

我试图让这样somtehing（其实我需要写一些集成功能）的CUDACuda的函数指针

enter image description here

我试过，但它并没有工作 - 它不仅造成。

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) { 
    return x; 
} 

__global__ void tabulate(float lower, float upper, float p_function(float), float*result){ 
    for (lower; lower < upper; lower++) { 
       *result = *result + p_function(lower); 
     } 
} 

int main(){ 
     float res; 
    float* dev_res; 

     cudaMalloc((void**)&dev_res, sizeof(float)) ; 

    tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res); 
    cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost) ; 

    printf("%f\n", res); 
    /************************************************************************/ 
    scanf("%s"); 

    return 0; 

}

来源

2013-03-26 DanilGholtsman

您使用什么卡？您似乎将您的代码编译为计算能力1.x，并且我认为函数指针是一个计算能力2.x功能。你可以改变你的nvcc调用，使其具有-gencode arch = compute_20，code = sm_20（如果你的卡支持它） – alrikai 2013-03-26 18:30:29

@alrikai GeForce 560Ti – DanilGholtsman 2013-03-26 19:38:40

然后，你应该改变你的编译从1.x到2.x，这将摆脱你的编译错误。然而，你可能仍然有一些运行时问题... – alrikai 2013-03-26 20:00:00

为了摆脱编译错误，编译代码时必须使用-gencode arch=compute_20,code=sm_20作为编译器参数。但你很可能有一些运行时的问题：

从CUDA编程指南采取http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

这样你就可以有这样的事情（改编自“FunctionPointers”样品）：

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float 
typedef unsigned char(*pointFunction_t)(unsigned char, float); 

//some device function to be pointed to 
__device__ unsigned char 
Threshold(unsigned char in, float thresh) 
{ 
    ... 
} 

//pComputeThreshold is a device-side function pointer to your __device__ function 
__device__ pointFunction_t pComputeThreshold = Threshold; 
//the host-side function pointer to your __device__ function 
pointFunction_t h_pointFunction; 

//in host code: copy the function pointers to their host equivalent 
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t))

然后，您可以将h_pointFunction作为参数传递给您的内核，该内核可以使用它来调用您的__device__函数。

//your kernel taking your __device__ function pointer as a parameter 
__global__ void kernel(pointFunction_t pPointOperation) 
{ 
    unsigned char tmp; 
    ... 
    tmp = (*pPointOperation)(tmp, 150.0) 
    ... 
} 

//invoke the kernel in host code, passing in your host-side __device__ function pointer 
kernel<<<...>>>(h_pointFunction);

希望这是有道理的。总之，它看起来像你将不得不将f1函数改为__device__函数并遵循类似的过程（typedefs不是必需的，但它们确实使代码更好），以将其作为有效的函数指针主机端传递给你的内核。我还建议给FunctionPointers CUDA示例看看

来源

2013-03-26 20:29:45 alrikai

哦，非常感谢你！ – DanilGholtsman 2013-03-26 20:56:29

除了上面的答案（+1）之外，您可能会对NVIDIA论坛中此线程中的设备代码中使用函数指针（不使用模板）的非常简单的示例感兴趣：https：// devtalk。 nvidia.com/default/topic/457094/how-can-i-use-__device__-function-pointer-in-cuda-/ – njuffa 2013-03-27 00:32:19

@njuffa不错！你的例子更清洁（并且完整） – alrikai 2013-03-27 00:57:13

即使你可以编译这段代码（见@Robert Crovella的回答）这个代码将无法正常工作。由于主机编译器无法确定函数地址，因此无法从主机代码传递函数指针。

来源

2013-03-26 18:40:53 Eugene

Cuda的函数指针

回答

相关问题