设备内存上的多个指针为单个分配数组在cuda

我想知道是否有可能设置多个指针，以单个数据已分配在内存中？我问这个的原因是因为我正在执行lexographical与GPU排序推力矢量的帮助（在时间上非常失败）设备内存上的多个指针为单个分配数组在cuda

例如我试图acheive相当于这些的C++ statments

unsigned int * pword;  //setting up the array of memory for permutations of word 
pword = new unsigned int [N*N]; 

unsigned int* * p_pword; //pointers to permutation words 
p_pword = new unsigned int* [N]; 

//setting up the pointers on the locations such that if N=4 then 0,4,8,12,... 
int count; 
for(count=0;count<N;count++) 
     p_pword[count]=&pword[count*N];

我不是要求某人向我提供代码，我只是想知道有没有什么方法可以设置指向单个数据数组的指针。 PS：我曾尝试以下方法，但在所有

int * raw_ptr = thrust::raw_pointer_cast(&d_Data[0]); //doing same with multiple pointers

没有实现任何加速，但我的事实猜测，由于我正在朝着device_vector指着它可能是缓慢的访问

任何帮助的问题在这方面受到高度赞赏。

来源

2013-04-08 Asif Ali

嗯，这没有任何意义：

int * raw_ptr = thrust::raw_pointer_cast([0]); 
             ^what is this??

我不认为行将正确编译。

但在推力你一定可以做这样的事情：

#include <thrust/host_vector.h> 
#include <thrust/device_vector.h> 
#include <thrust/device_ptr.h> 
#include <thrust/sequence.h> 

int main(){ 

    int N=16; 
    thrust::device_vector<int> d_A(4*N); 
    thrust::sequence(d_A.begin(), d_A.end()); 
    thrust::device_ptr<int> p_A[N]; 
    for (int i=0; i<N; i++) 
    p_A[i] = &(d_A[4*i]); 
    thrust::host_vector<int> h_A(N); 
    thrust::copy(p_A[4], p_A[8], h_A.begin()); 
    for (int i=0; i<N; i++) 
    printf("h_A[%d] = %d\n", i, h_A[i]); 
    return 0; 
}

不知道要说什么加速。在你发布的一小段代码中加快速度对我来说并不合适。

来源

2013-04-08 06:49:12

再次感谢罗伯特Crovella答复其实我试图做到这一点 INT * raw_ptr =推力:: raw_pointer_cast（d_Data [0]）; – 2013-04-08 19:13:19

好的答案的方式（但我已经这样使用推力的device_ptrs），并对不起，如果我在提问混淆我想问有没有什么办法可以使指针数组保存在CUDA中的单个数组的地址内存（假设无符号整型* d_Data）我已经实现了您在我的例子与上述相同的逻辑，但我一直在寻找多个指针单阵列（未device_vector） – 2013-04-08 19:20:12

这应该工作。但是它创建了一个在推力算法中不便于使用的指针。但是，您可以在普通CUDA代码中使用该指针。 – 2013-04-08 19:20:43

设备内存上的多个指针为单个分配数组在cuda

回答

相关问题