如何从OpenCL代码启动另一个线程？

数据生成。在这一步中，我生成循环中的数据数组，作为一些函数结果
数据处理。对于这一步，我编写了处理在上一步中生成的数据数组的OpenCL内核。

现在第一步运行在CPU上，因为它很难并行化。我想在GPU上运行它，因为每一代都需要一些时间。我想立即为已经生成的数据运行第二步。

我可以从当前运行的内核运行另一个opencl内核在单独的线程中吗？或者它在调用内核的某个线程中运行？

一些伪码，说明我的观点：

__kernel second(__global int * data, int index) { 
    //work on data[i]. This process takes a lot of time 
} 

__kernel first(__global int * data, const int length) { 
    for (int i = 0; i < length; i++) { 
     // generate data and store it in data[i] 

     // This kernel will be launched in some thread that caller or in new thread? 
     // If in same thread, there are ways to launch it in separated thread? 
     second(data, i); 
    } 
}

来源

2011-03-15 Eugene Burtsev