2013-03-26 83 views
3

我试图让这样somtehing(其实我需要写一些集成功能)的CUDACuda的函数指针

enter image description here

我试过,但它并没有工作 - 它不仅造成。

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) { 
    return x; 
} 

__global__ void tabulate(float lower, float upper, float p_function(float), float*result){ 
    for (lower; lower < upper; lower++) { 
       *result = *result + p_function(lower); 
     } 
} 

int main(){ 
     float res; 
    float* dev_res; 

     cudaMalloc((void**)&dev_res, sizeof(float)) ; 

    tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res); 
    cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost) ; 

    printf("%f\n", res); 
    /************************************************************************/ 
    scanf("%s"); 

    return 0; 

} 
+0

您使用什么卡?您似乎将您的代码编译为计算能力1.x,并且我认为函数指针是一个计算能力2.x功能。你可以改变你的nvcc调用,使其具有-gencode arch = compute_20,code = sm_20(如果你的卡支持它) – alrikai 2013-03-26 18:30:29

+0

@alrikai GeForce 560Ti – DanilGholtsman 2013-03-26 19:38:40

+0

然后,你应该改变你的编译从1.x到2.x,这将摆脱你的编译错误。然而,你可能仍然有一些运行时问题... – alrikai 2013-03-26 20:00:00

回答

7

为了摆脱编译错误,编译代码时必须使用-gencode arch=compute_20,code=sm_20作为编译器参数。但你很可能有一些运行时的问题:

从CUDA编程指南采取http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

这样你就可以有这样的事情(改编自“FunctionPointers”样品):

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float 
typedef unsigned char(*pointFunction_t)(unsigned char, float); 

//some device function to be pointed to 
__device__ unsigned char 
Threshold(unsigned char in, float thresh) 
{ 
    ... 
} 

//pComputeThreshold is a device-side function pointer to your __device__ function 
__device__ pointFunction_t pComputeThreshold = Threshold; 
//the host-side function pointer to your __device__ function 
pointFunction_t h_pointFunction; 

//in host code: copy the function pointers to their host equivalent 
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t)) 

然后,您可以将h_pointFunction作为参数传递给您的内核,该内核可以使用它来调用您的__device__函数。

//your kernel taking your __device__ function pointer as a parameter 
__global__ void kernel(pointFunction_t pPointOperation) 
{ 
    unsigned char tmp; 
    ... 
    tmp = (*pPointOperation)(tmp, 150.0) 
    ... 
} 

//invoke the kernel in host code, passing in your host-side __device__ function pointer 
kernel<<<...>>>(h_pointFunction); 

希望这是有道理的。总之,它看起来像你将不得不将f1函数改为__device__函数并遵循类似的过程(typedefs不是必需的,但它们确实使代码更好),以将其作为有效的函数指针主机端传递给你的内核。我还建议给FunctionPointers CUDA示例看看

+0

哦,非常感谢你! – DanilGholtsman 2013-03-26 20:56:29

+0

除了上面的答案(+1)之外,您可能会对NVIDIA论坛中此线程中的设备代码中使用函数指针(不使用模板)的非常简单的示例感兴趣:https:// devtalk。 nvidia.com/default/topic/457094/how-can-i-use-__device__-function-pointer-in-cuda-/ – njuffa 2013-03-27 00:32:19

+0

@njuffa不错!你的例子更清洁(并且完整) – alrikai 2013-03-27 00:57:13

1

即使你可以编译这段代码(见@Robert Crovella的回答)这个代码将无法正常工作。由于主机编译器无法确定函数地址,因此无法从主机代码传递函数指针。