Why the cvtColor use a lot of CPU when process UMat in opencl gpu mode

OpenCV version is 3.3. Windows 10.

My GPU is GTX 750, I'm sure the cvtColor is call in opencl mode, because I set


and print the device name use follow code

        char        *value;
        size_t      valueSize;
        //print the device name
            CL_DEVICE_NAME, 0, NULL, &valueSize);
        value = (char*)malloc(valueSize);
            CL_DEVICE_NAME, valueSize, value, NULL);
        OutputDebugStringA("Device Name: ");

the result is :

Device Name: GeForce GTX 750

and I'm sure cvtColor is call ocl_cvtColor to do the task.

when I call

cv::cvtColor(*nv12_frame_, *src_frame_, cv::COLOR_YUV420p2BGR);

30 times per second, the app CPU usage in task manager is 8. BUT, if use OpenCL API and kernel function directly, the CPU usage in task manger is almost 0.

I'm curious about why the cvtColor use a lot of CPU in opencl mode, but use opencl directly, the cpu usage is almost 0.

