调用手写的CUDA内核与推力

因为我需要用CUDA排序大量数组，所以我使用了推力。到目前为止，这么好......但是当我想调用一个“手写”内核时，有一个包含数据的thrust :: host_vector？调用手写的CUDA内核与推力

我的做法是（backcopy丢失）：

int CUDA_CountAndAdd_Kernel(thrust::host_vector<float> *samples, thrust::host_vector<int> *counts, int n) { 

thrust::device_ptr<float> dSamples = thrust::device_malloc<float>(n); 
thrust::copy(samples->begin(), samples->end(), dSamples); 

thrust::device_ptr<int> dCounts = thrust::device_malloc<int>(n); 
thrust::copy(counts->begin(), counts->end(), dCounts); 

float *dSamples_raw = thrust::raw_pointer_cast(dSamples); 
int *dCounts_raw = thrust::raw_pointer_cast(dCounts); 

CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw); 

thrust::device_free(dCounts); 
thrust::device_free(dSamples); 
}

内核的样子：

__global__ void CUDA_CountAndAdd_Kernel_Device(float *samples, int *counts)

但是编译失败：

error: argument of type "float **" is incompatible with parameter of type "thrust::host_vector> *"

咦？我以为我给浮点数和整数生指数？或者我错过了什么？

来源

2010-03-07 Sebastian Dressler

您正在调用内核时使用的是调用函数的名称，而不是内核的名称 - 因此参数不匹配。

变化：

CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

到

CUDA_CountAndAdd_Kernel_Device<<<1, n>>>(dSamples_raw, dCounts_raw);

，看看会发生什么。

来源

2010-03-08 01:31:54

D'oh！ - 错误总是由我自己，而不是在执行。 – 2010-03-08 05:50:57

调用手写的CUDA内核与推力

回答

相关问题