使用主机作为设备

-1

我在OpenCL上工作，我只有一个CPU i3核心Duo =>我只拥有1个设备（我的CPU）。所以基本上，我猜我的HOST（CPU）也将是DEVICE。我尝试启动内核，但分配给DEVICE（也是HOST）的任务永远不会终止。在考虑这个问题后，似乎显而易见的是，等待DEVICE（本身）完成的HOST是不可能的。但是有没有人知道解决这个问题的方法？也许使用clCreateSubDevice，将我的唯一设备细分为主机和真实设备？使用主机作为设备

来源

2016-03-01 Algernon2

没有任何代码很难说，但通常你应该能够使用CPU作为设备，没有任何特殊的“箍”。 – Aderstedt

有没有这样的事情作为一个i3核心二重奏，有英特尔酷睿i3双核心，并有英特尔酷睿双核处理器。请更具体地说明您正在使用哪个CPU。我也认为你的代码可能有不同的问题。许多Core i3还包含GPU，主机和设备则使用芯片的不同部分。即使您将CPU用作设备，代码也会在单独的线程中运行。 –

无论实际的CPU和内核数量如何，大多数OpenCL CPU驱动程序都为DEVICE工作使用线程，因此HOST线程可以在计算和其他设备活动继续进行时继续执行其操作。您不需要调用阻塞API来发生工作。 – Dithermaster

-1

我觉得我的想法并没有那么糟糕，因为的确需要以编程方式强制HOST切换到DEVICE工作，在这种情况下，HOST和DEVICE都是相同的硬件。实际上，可以将HOST作为DEVICE，但为了让DEVICE工作，至少需要调用一个阻塞函数（clFinish（）或clEnqueueRead（... CL_TRUE，...））。 ..））。否则，主机将始终工作，并永远不会切换到DEVICE工作。我尝试添加一个sleep（）函数，但它不起作用，您确实需要添加一个阻塞的opencl函数。

非常感谢！

来源

2016-03-02 15:31:17 Algernon2

这不是真的，入队函数没有阻塞。例如，'clEnqueueNDRangeKernel'函数只是将内核放入一个队列中。即使内核尚未完成，主机线程也会继续执行。 –

谢谢你，马丁，但请在我之前的回复之前查看我的代码，以便让我知道我的错误。在clFinish（...）指令被注释的情况下，我检索任务INCOMPLETE作为输出，并且任务COMPLETE事例clFinish（...）被分解。你能说我为什么，我的错误在哪里？谢谢 – Algernon2

是的，'clFinsh'等待命令完成并因此阻塞主机线程。但是，'clEnqueueNDRangeKernel'仍然是非阻塞的。你可以在两次通话之间做任何你想要的东西。 –

您会在下面找到我的类似java的代码，以便让我知道我的错误。其实当我运行下面的代码没有clFinish（commandQueue）; （在底部代码），我有以下输出：

我使用平台英特尔（R）OpenCL 入队内核... 暂停15000毫秒。任务未完成

如果我添加clFinish（commandQueue）我有输出，我的任务完成：

我使用该平台的英特尔（R）OpenCL的内核进行排队... 事件内核状态：CL_COMPLETE事件ID：10运行时间：2.631ms 暂停15000毫秒。任务完成

那么为什么单个clFinish（）指令允许我完成任务？感谢您提前解释。

public class Test_CPU 
{ 


    private static String programSource0 = 
     "__kernel void vectorAdd(" + 
     "  __global const float *a,"+ 
     "  __global const float *b, " + 
     "  __global float *c)"+ 
     "{"+ 
     " int gid = get_global_id(0);"+ 
     " c[gid] = a[gid]+b[gid];"+ 
     "}"; 

    /** 
    * The entry point of this sample 
    * 
    * @param args Not used 
    */ 
    public static void main(String args[]) 
    { 
     /** 
     * Callback function that is called when the event ev has the event_status status and will display the runtime of execution kernel in seconds 
     * @param event:  the event 
     * @param event_status: status of the event 
     * @param user_data: data given by the user is an integer tag that can be used to match profiling output to the associated kernel 
     * @return:    none 
     */ 
     EventCallbackFunction kernelCommandEvent = new EventCallbackFunction() 
     { 
      @Override 
      public void function(cl_event event, int event_status, Object user_data) 
      { 
       int evID = (int)user_data; 
       long[] ev_start_time = new long[1]; 
       Arrays.fill(ev_start_time, 0); 
       long[] ev_end_time = new long[1]; 
       Arrays.fill(ev_end_time, 0); 
       long[] return_bytes = new long[1]; 
       double run_time = 0.0; 

       clGetEventProfilingInfo (event, CL_PROFILING_COMMAND_QUEUED, Sizeof.cl_long, Pointer.to(ev_start_time), return_bytes); 
       clGetEventProfilingInfo (event, CL_PROFILING_COMMAND_END , Sizeof.cl_long, Pointer.to(ev_end_time), return_bytes); 

       run_time = (double)(ev_end_time[0] - ev_start_time[0]); 
       System.out.println("Event kernel status: " + CL.stringFor_command_execution_status(event_status) + " event ID: " + evID + " runtime: " + String.format("%8.3f", (run_time*1.0e-6)) + " ms."); 
      } 
     }; 

     // Initialize the input data 
     int n = 1000000; 
     float srcArrayA[] = new float[n]; 
     float srcArrayB[] = new float[n]; 
     float dstArray0[] = new float[n]; 

     for (int i=0; i<srcArrayA.length; i++) 
     { 
      srcArrayA[i] = i; 
      srcArrayB[i] = i; 
     } 
     Pointer srcA = Pointer.to(srcArrayA); 
     Pointer srcB = Pointer.to(srcArrayB); 
     Pointer dst0 = Pointer.to(dstArray0); 

     // The platform, device type and device number that will be used 
     final int platformIndex = 1; 
     final long deviceType = CL_DEVICE_TYPE_CPU; 
     final int deviceIndex = 0; 

     // Enable exceptions and subsequently omit error checks in this sample 
     CL.setExceptionsEnabled(true); 

     // Obtain the number of platforms 
     int numPlatformsArray[] = new int[1]; 
     clGetPlatformIDs(0, null, numPlatformsArray); 
     int numPlatforms = numPlatformsArray[0]; 

     // Obtain a platform ID 
     cl_platform_id platforms[] = new cl_platform_id[numPlatforms]; 
     clGetPlatformIDs(platforms.length, platforms, null); 
     cl_platform_id platform = platforms[platformIndex]; 

     long size[] = new long[1]; 
     clGetPlatformInfo(platform, CL_PLATFORM_NAME, 0, null, size); 
     // Create a buffer of the appropriate size and fill it with the info 
     byte buffer[] = new byte[(int)size[0]]; 
     clGetPlatformInfo(platform, CL_PLATFORM_NAME, buffer.length, Pointer.to(buffer), null); 
     // Create a string from the buffer (excluding the trailing \0 byte) 
     System.out.println("I use the platform " + new String(buffer, 0, buffer.length-1)); 

     // Initialize the context properties 
     cl_context_properties contextProperties = new cl_context_properties(); 
     contextProperties.addProperty(CL_CONTEXT_PLATFORM, platform); 

     // Obtain the number of devices for the platform 
     int numDevicesArray[] = new int[1]; 
     clGetDeviceIDs(platform, deviceType, 0, null, numDevicesArray); 
     int numDevices = numDevicesArray[0]; 

     // Obtain a device ID 
     cl_device_id devices[] = new cl_device_id[numDevices]; 
     clGetDeviceIDs(platform, deviceType, numDevices, devices, null); 
     cl_device_id device = devices[deviceIndex]; 

     // Create a context for the selected device 
     cl_context context = clCreateContext(contextProperties, 1, new cl_device_id[]{device}, null, null, null); 

     // Create a command-queue, with profiling info enabled 
     long properties = 0; 
     properties |= CL.CL_QUEUE_PROFILING_ENABLE; 
     cl_command_queue commandQueue = CL.clCreateCommandQueue(context, devices[0], properties, null); 

     // Allocate the buffer memory objects 
     cl_mem srcMemA = CL.clCreateBuffer(context, CL.CL_MEM_READ_ONLY | CL.CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * n, srcA, null); 
     cl_mem srcMemB = CL.clCreateBuffer(context, CL.CL_MEM_READ_ONLY | CL.CL_MEM_COPY_HOST_PTR, Sizeof.cl_float * n, srcB, null); 
     cl_mem dstMem0 = CL.clCreateBuffer(context, CL.CL_MEM_READ_WRITE, Sizeof.cl_float * n, null, null); 

     // Create and build the the programs and the kernels 
     cl_program program0 = CL.clCreateProgramWithSource(context, 1, new String[]{ programSource0 }, null, null); 

     // Build the programs 
     CL.clBuildProgram(program0, 0, null, null, null, null); 

     // Create the kernels 
     cl_kernel kernel0 = CL.clCreateKernel(program0, "vectorAdd", null); 

     // Set the arguments 
     CL.clSetKernelArg(kernel0, 0, Sizeof.cl_mem, Pointer.to(srcMemA)); 
     CL.clSetKernelArg(kernel0, 1, Sizeof.cl_mem, Pointer.to(srcMemB)); 
     CL.clSetKernelArg(kernel0, 2, Sizeof.cl_mem, Pointer.to(dstMem0)); 

     // Set work-item dimensions and execute the kernels 
     long globalWorkSize[] = new long[]{n}; 

     System.out.println("Enqueueing kernels..."); 
     cl_event[] myEventID = new cl_event[1]; 
     myEventID[0] = new cl_event(); 
     clEnqueueNDRangeKernel(commandQueue, kernel0, 1, null, globalWorkSize, null, 0, null, myEventID[0]); 

     int ID[] = new int[1]; 
     ID[0] = 10; 
     clSetEventCallback(myEventID[0], CL_COMPLETE, kernelCommandEvent, ID[0]); 

     clFinish(commandQueue); 
     System.out.println("Pause for 15000 ms."); 
     try 
     { 
      Thread.sleep(15000); 
     } 
     catch(InterruptedException iEx) 
     { 
      iEx.printStackTrace(); 
     } 

     // See if task completed 
     int[] ok = new int[1]; 
     Arrays.fill(ok, 0); 
     clGetEventInfo(myEventID[0], CL_EVENT_COMMAND_EXECUTION_STATUS, Sizeof.cl_int, Pointer.to(ok), null); 
     if (ok[0] == CL_COMPLETE) System.out.println("Task COMPLETE");else System.out.println("Task INCOMPLETE"); 
    } 
}

来源

2016-03-03 09:04:52 Algernon2

使用主机作为设备

回答

相关问题