使用推力与printf/cout

我想学习如何使用CUDA与推力，我已经看到了一些代码，其中的printf函数似乎从设备使用。使用推力与printf/cout

考虑以下代码：

#include <thrust/host_vector.h> 
#include <thrust/device_vector.h> 
#include <cstdio> 

struct functor 
{ 
    __host__ __device__ 
    void operator()(int val) 
    { 
     printf("Call for value : %d\n", val); 
    } 
}; 

int main() 
{ 
    thrust::host_vector<int> cpu_vec(100); 
    for(int i = 0 ; i < 100 ; ++i) 
     cpu_vec[i] = i; 
    thrust::device_vector<int> cuda_vec = cpu_vec; //transfer to GPU 
    thrust::for_each(cuda_vec.begin(),cuda_vec.end(),functor()); 
}

这看上去一切正常，并打印100次消息“的价值呼唤：”后跟一个数字。

现在如果我有iostream的，用C++基于流的等效

std::cout << "Call for value : " << val << std::endl;

我得到合辑NVCC警告和编译程序将不打印任何东西取代的printf线。

warning: address of a host variable "std::cout" cannot be directly taken in a device function 
warning: calling a __host__ function from a __host__ __device__ function is not allowed 
warning: calling a __host__ function("std::basic_ostream<char, std::char_traits<char> >::operator <<") from a __host__ __device__ function("functor::operator()") is not allowed

为什么它与printf的工作吗？
为什么它不是与cout一起工作？
实际在GPU上运行的是什么？我猜想，至少发送到标准输出需要一些CPU的工作。

来源

2016-04-26 bct

'printf'作为'__device__'函数“重载”，而'cout'不是。您需要明确的“重载”打印功能，因为您需要正确处理输出缓冲区。看一看'simplePrintf'的例子，你会感觉到你为什么需要显式重载以及如何做到这一点。由于'cout'只是'__host__'函数，'nvcc'不能编译它。 – JackOLantern

为什么它与printf的工作吗？

由于NVIDIA添加运行时支持用于在内核printf进行支撑装置ABI（计算能力> = 2.0）中的所有硬件。设备代码中存在主机printf的模板过载，该设备代码提供（几乎）标准C风格printf功能。您必须在设备代码中包含cstdio或stdio.h以使此机制生效。

为什么它不适用于cout？

因为NVIDIA还没有实施任何形式的C++的iostream式I/O的CUDA设备运行时间内的支持。

实际在GPU上运行的是什么？

的装置运行时保持用于内核代码写入到内核执行期间经由printf的调用一个FIFO缓冲器。设备缓冲区由CUDA驱动程序复制并在内核执行结束时回显给stdout。确切的启发式和机制没有记录，但我会假设格式字符串和输出存储到FIFO缓冲区，然后由CPU驱动程序解析，然后通过内核启动API的某种回调打印。运行时API提供了一个function用于控制printf FIFO的大小。

来源

2016-04-26 07:56:59 talonmies

使用推力与printf/cout

回答

相关问题