I am trying to improve the performance of a few of my functions to take advantage of async memory copies. I have a Mat that I apply a number of operations to on the CPU and then I upload it to a GpuMat. If I create it as a cv::gpu::cudaMem rather than a cv::Mat the CPU functions stop working as they are expecting a cv::Mat. Is there a way of using cudaMems as inputs into normal CPU functions which will then allow me to copy the results asynchronously to the GPU when required. I know I could do an intermediate memcopy from Mat.data to cudaMem.data but this seems like a less efficient way to work.