Revision history [back]

Window size limit in GPU accelerated LK pyramid

I am performing image stabilization on a real-time feed in order to run some vision algorithms on the stabilized images (emphasis on "real-time"). Currently this process, which uses the CPU-implemented version of the LK pyramids, is too slow, even when building the pyramid beforehand (the reference image and "previous" features are only ever calculated once). I thought I might attempt to speed things up by incorporating the GPU since OpenCV has implemented the same LK approach for CUDA-capable devices, the cv::gpu::PyrLKOpticalFlow class. I'm using the ::sparse call with a set of previous features.

My main issue is that there seems to be a limit on the window size, and mine is too large. The limit occurs in the pyrlk.cpp file as an assertion:

CV_Assert(patch.x > 0 && patch.x < 6 && patch.y > 0 && patch.y < 6);

Where the patch dimensions are determined right above:

void calcPatchSize(cv::Size winSize, dim3& block, dim3& patch)
{
    if (winSize.width > 32 && winSize.width > 2 * winSize.height)
    {
        block.x = deviceSupports(FEATURE_SET_COMPUTE_12) ? 32 : 16;
        block.y = 8;
    }
    else
    {
        block.x = 16;
        block.y = deviceSupports(FEATURE_SET_COMPUTE_12) ? 16 : 8;
    }

    patch.x = (winSize.width  + block.x - 1) / block.x;
    patch.y = (winSize.height + block.y - 1) / block.y;

    block.z = patch.z = 1;
}

My problem is I need a window size of about 100x100 pixels, which is A. why I want to employ GPU acceleration and B. why that seems to not work in OpenCV. :)

I'm not familiar with actually implementing GPU acceleration so I am wondering if someone can explain why this limitation exists in OpenCV, if it's a real limitation imposed by the hardware or by the OpenCV implementation, and if there are ways to work around it. It seems odd that this would be a hardware limitation, since these are the situations when you'd want to use a GPU. I can get reasonable speed with smaller search windows but the stabilization is not good enough for the application.

I need such a large search window size because I'm calculating the motion to the first (reference) frame. The motion is cyclical plus some small random drift so this method works well, but requires a bit more space to search at the peaks of the cycle when the matching features might be around 30-40 pixels away (at original resolution).

This is using OpenCV version 2.4.10 on Linux, built from source for CUDA support.