UMat fast downloading GPU >> CPU
I am working on implementing some OpenCL GPU kernels using OpenCV. This is all working fine, but I am wondering what the fastest is to download the images back from the device to host (GPU >> CPU). I have a total of 10 channels that I want to copy back to CPU memory.
cv::UMat gpu_luvm_u8 = cv::UMat(cpuInputImage.size(), CV_8UC4);
cv::UMat gpu_angle1_u8 = cv::UMat(cpuInputImage.size(), CV_8UC4);
cv::UMat gpu_angle2_u8 = cv::UMat(cpuInputImage.size(), CV_8UC4); // last 2 channels unused
Of course after processing them on the CPU I can download them to an CV_8UC4 using .copyTo(). However since I am using 10 channels and I would like to have them at std::vector<cv::mat> I am wondering what the fastest method would be? Also the GPU matrices gpu_angle1_u8 and gpu_angle2_u8 contain the same sort of data, but since 4 channels is the maximal depth in OpenCV (?) I am have split them over two matrices. Is this a correct assumption or is there a better way of getting all data into a single UMat ? Any feedback is more than welcome!
Btw, I am using OpenCV 3.0.0