I am performing image stabilization on a real-time feed in order to run some vision algorithms on the stabilized images (emphasis on "real-time"). Currently this process, which uses the CPU-implemented version of the LK pyramids, is too slow, even when building the pyramid beforehand (the reference image and "previous" features are only ever calculated once). I thought I might attempt to speed things up by incorporating the GPU since OpenCV has implemented the same LK approach for CUDA-capable devices, the cv::gpu::PyrLKOpticalFlow class. I'm using the ::sparse call with a set of previous features.
My main issue is that there seems to be a limit on the window size, and mine is too large. The limit occurs in the pyrlk.cpp file as an assertion:
CV_Assert(patch.x > 0 && patch.x < 6 && patch.y > 0 && patch.y < 6);
Where the patch dimensions are determined right above:
void calcPatchSize(cv::Size winSize, dim3& block, dim3& patch)
{
if (winSize.width > 32 && winSize.width > 2 * winSize.height)
{
block.x = deviceSupports(FEATURE_SET_COMPUTE_12) ? 32 : 16;
block.y = 8;
}
else
{
block.x = 16;
block.y = deviceSupports(FEATURE_SET_COMPUTE_12) ? 16 : 8;
}
patch.x = (winSize.width + block.x - 1) / block.x;
patch.y = (winSize.height + block.y - 1) / block.y;
block.z = patch.z = 1;
}
My problem is I need a window size of about 100x100 pixels, which is A. why I want to employ GPU acceleration and B. why that seems to not work in OpenCV. :)
I'm not familiar with actually implementing GPU acceleration so I am wondering if someone can explain why this limitation exists in OpenCV, if it's a real limitation imposed by the hardware or by the OpenCV implementation, and if there are ways to work around it. It seems odd that this would be a hardware limitation, since these are the situations when you'd want to use a GPU. I can get reasonable speed with smaller search windows but the stabilization is not good enough for the application.
I need such a large search window size because I'm calculating the motion to the first (reference) frame. The motion is cyclical plus some small random drift so this method works well, but requires a bit more space to search at the peaks of the cycle when the matching features might be around 30-40 pixels away (at original resolution).
This is using OpenCV version 2.4.10 on Linux, built from source for CUDA support.