在内核函数中使用cuPrint打印字符串向量的元素

我试图打印使用cuPrint函数作为内核函数参数传递的字符串向量的元素。在内核函数中使用cuPrint打印字符串向量的元素

内核

__global__ void testKernel(string wordList[10000]) 
{ 
    //access thread id 
    const unsigned int bid = blockIdx.x; 
    const unsigned int tid = threadIdx.x; 
    const unsigned int index = bid * blockDim.x + tid; 


    cuPrintf("wordList[%d]: %s \n", index, wordList[index]); 
}

从主要功能设置执行参数代码的代码和启动内核

//Allocate device memory for word list 
    string* d_wordList; 
    cudaMalloc((void**)&d_wordList, sizeof(string)*number_of_words); 

    //Copy word list from host to device 
    cudaMemcpy(d_wordList, wordList, sizeof(string)*number_of_words, cudaMemcpyHostToDevice); 

    //Setup execution parameters 
    int n_blocks = (number_of_words + 255)/256; 
    int threads_per_block = 256; 

    dim3 grid(n_blocks, 1, 1); 
    dim3 threads(threads_per_block, 1, 1); 

    cudaPrintfInit(); 
    testKernel<<<grid, threads>>>(d_wordList); 
    cudaDeviceSynchronize(); 
    cudaPrintfDisplay(stdout,true); 
    cudaPrintfEnd();

我收到错误： “错误44错误：调用主机函数（“std :: basic_string，std :: allocator> ::〜basic_string”）从全球函数（“testKernel”）不被允许D：... \ kernel.cu 44 1 CUDA_BF_lar ge_word_list “

我错过了什么？在此先感谢。

来源

2014-09-22 Alex Iacob

通常，您不能在CUDA设备代码中使用C++库中的函数（包括<string>）。使用数组char来代替您的字符串。

Here是将“字符串”操作为以空字符结尾的C风格数组并将它们传递给内核的示例。

来源

2014-09-22 12:59:52

我正在从这样的文本文件中读取文字 \t //构建包含来自文本文件的文字的字符串数组 \t string wordList [10000]; \t if（file。IS_OPEN（）） \t { \t \t \t 为\t（INT I = 0; I >单词一览[I]; \t \t \t // cout << endl << wordList [i] << endl; \t \t} \t \t \t} 会有什么用字符数组的变化？ – 2014-09-22 13:30:43

在我的答案中提供了示例代码的链接，其中显示了如何操作C风格的字符串。我假设你可以处理文件I/O。这不是CUDA特有的。 – 2014-09-22 15:20:56

是的，处理文件I/O没有问题。谢谢！ – 2014-09-23 06:50:21

我修改了代码，并使用了一串字符串的字符串。

内核的更新版本：

__global__ void testKernel(char* d_wordList) 
{ 
    //access thread id 
    const unsigned int bid = blockIdx.x; 
    const unsigned int tid = threadIdx.x; 
    const unsigned int index = bid * blockDim.x + tid; 


    //cuPrintf("Hello World from kernel! \n"); 


      cuPrintf("!! %c%c%c%c%c%c%c%c%c%c \n" , d_wordList[index * 20 + 0], 
                d_wordList[index * 20 + 1], 
                d_wordList[index * 20 + 2], 
                d_wordList[index * 20 + 3], 
                d_wordList[index * 20 + 4], 
                d_wordList[index * 20 + 5], 
                d_wordList[index * 20 + 6], 
                d_wordList[index * 20 + 7], 
                d_wordList[index * 20 + 8], 
                d_wordList[index * 20 + 9]); 


}

我也想知道是否有从字符数组打印的话更简单的方法。（低音我需要打印，以后每个内核函数使用一个单词）。

从主功能的代码是：

  const int text_length = 20; 

     char (*wordList)[text_length] = new char[10000][text_length]; 
     char *dev_wordList; 

     for(int i=0; i<number_of_words; i++) 
     { 
      file>>wordList[i]; 
      cout<<wordList[i]<<endl; 
     } 

     cudaMalloc((void**)&dev_wordList, 20*number_of_words*sizeof(char)); 
     cudaMemcpy(dev_wordList, &(wordList[0][0]), 20 * number_of_words * sizeof(char), cudaMemcpyHostToDevice); 

     char (*resultWordList)[text_length] = new char[10000][text_length]; 

     cudaMemcpy(resultWordList, dev_wordList, 20 * number_of_words * sizeof(char), cudaMemcpyDeviceToHost); 

     for(int i=0; i<number_of_words; i++) 
      cout<<resultWordList[i]<<endl; 

     //Setup execution parameters 
     int n_blocks = (number_of_words + 255)/256; 
     int threads_per_block = 256; 


     dim3 grid(n_blocks, 1, 1); 
     dim3 threads(threads_per_block, 1, 1); 

cudaPrintfInit(); 
     testKernel<<<grid, threads>>>(dev_wordList); 
     cudaDeviceSynchronize(); 
     cudaPrintfDisplay(stdout,true); 
     cudaPrintfEnd();

如果使用更小的值这样的块/线程的数目：

dim3 grid(20, 1, 1); 
dim3 threads(100, 1, 1);

内核发射是正确的，它显示一个字每个线程。但我需要这个过程10000字。我错过了什么？

来源

2014-09-23 12:07:06

发布自己的问题的答案，并用它来问一个新的问题可能不是一个好主意。这不是真的如何运作。如果您有新问题，建议您提出一个新问题。请注意，对于我来说，你最后的问题还不清楚。什么不是专门工作的？你知道像每个块的限制线程吗？您是否意识到内核中的printf在可产生的输出量方面有限？什么实际上不工作？（发布一个新问题） – 2014-09-23 21:04:55

好的，谢谢你的建议。我知道每个数据块限制的线程数，在我的情况下，每个数据块的线程数是512.问题是，对于内核不输出的更大的网格/线程数参数，但问题可能是cuPritf函数的限制。 – 2014-09-24 07:06:19

我调查了这个问题，原因是cuPrintf仅限于多达2048个线程的网格。 – 2014-09-24 07:53:47

在内核函数中使用cuPrint打印字符串向量的元素

回答

相关问题