OpenCL的开始 - 结束时间谱的时间比实际持续时间

我写了OpenCL的计划和我执行我的内核是这样OpenCL的开始 - 结束时间谱的时间比实际持续时间

Loop for MultipleGPU{ 
clEnqueueNDRangeKernel(commandQueues[i], kernel[i], 1, null, 
     global_work_size, local_work_size, 0, new cl_event[]{userEvent}, events[i]); 
clFlush(commandQueues[i]); 
} 

long before = System.nanoTime(); 

// Set UserEvent = Complete so all kernel can start executing 
clSetUserEventStatus(userEvent, CL_COMPLETE); 

// Wait until the work is finished on all command queues 
clWaitForEvents(events.length, events); 

long after = System.nanoTime(); 

float totalDurationMs = (after - before)/1e6f; 

...profiling each events with CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END...

的userEvent确保在同一时间的内核运行。资料来源：[Reima's Answer]：How do I know if the kernels are executing concurrently?。

而且我得到这个结果从一个系统的2特斯拉K20M GPU在里面：

Total duration :37.800076ms 
Duration on device 1 of 2: 38.037186 
Duration on device 2 of 2: 37.85744

有人能向我解释为什么始端配置文件时间比总持续时间所花的时间？

谢谢

来源

2013-10-16 aelias

阅读原因：Timer Accuracy。

你不应该相信这些系统调用会给你时间，通常他们的精确度为+ -1ms，除非你深入CPU周期（但那很困难）。但是，GPU时序非常精确（在几纳秒级别），请使用它。

编辑：如果你想测试它（为了高兴）：将内核排队1000次并且总结每次执行的次数，然后比较系统时间。在这种情况下，它不应该更高，因为在执行时间（38秒）内时间的准确性要低得多。

来源

2013-10-16 14:40:06 DarkZeros

OpenCL的开始 - 结束时间谱的时间比实际持续时间

回答

相关问题