OpenCV Displaying UMat Efficiently

asked 2014-08-06 15:18:52 -0500

protonmesh gravatar image

I'm excited about OpenCV's transparent API design and the ability to hardware accelerate image processing on a UMat on platforms that support it.

But how do we go about efficiently displaying a UMat?

// (Naive) Approach 1: Displaying UMat with imshow + non-OpenGL namedWindow
int _tmain(int argc, _TCHAR* argv[])
{
    std::string window_name = "Displaying UMat";
    Mat img_host = imread("Resources/win7.jpg");
    UMat img_device;

    img_host.copyTo(img_device);

    imshow(window_name, img_device);
    waitKey();
}

In the naive approach where imshow uses Win32 GDI display, the UMat must be copied from the OpenCL device (GPU) to the host (CPU), correct?

// Approach 2: Displaying UMat with imshow + namedWindow(OPENGL)
int _tmain(int argc, _TCHAR* argv[])
{
    std::string window_name = "Displaying UMat";
    Mat img_host = imread("Resources/win7.jpg");
    UMat img_device;

    img_host.copyTo(img_device);

    namedWindow(window_name, WINDOW_OPENGL | WINDOW_AUTOSIZE);

    imshow(window_name, img_device);
    waitKey();
}

I would've assumed that in calling imshow with UMat, the behaviour would be something similar to what is done for GpuMat: copy to a seperate buffer -> bind buffer to GL_PIXEL_UNPACK_BUFFER -> create texture from buffer -> render texture.

But for displaying UMat, it seems the getMat() is called on the UMat, which effectively maps OpenCL device memory to host (CPU) memory. Then, glTexSubImage2D is called, passing a pointer to the mapped OpenCL buffer.

I don't know how the mechanics of the execution of such a texture upload statement. It would be great if the driver knew that the data pointer we are passing to glTexSubImage2D is mapped pointer (to GPU memory) and performs a DMA-copy from the mapped region to texture object's data store.

Or does the more inefficient alternative occur. Ie. the CPU copies the UMat mapped memory into CPU memory, and the OpenGL driver once again uploads the data back to a texture object. Does the data make a roundtrip from GPU->CPU->GPU?

edit retag flag offensive close merge delete