2015-02-05 09:49:41 -0500 asked a question Window size limit in GPU accelerated LK pyramid I am performing image stabilization on a real-time feed in order to run some vision algorithms on the stabilized images (emphasis on "real-time"). Currently this process, which uses the CPU-implemented version of the LK pyramids, is too slow, even when building the pyramid beforehand (the reference image and "previous" features are only ever calculated once). I thought I might attempt to speed things up by incorporating the GPU since OpenCV has implemented the same LK approach for CUDA-capable devices, the cv::gpu::PyrLKOpticalFlow class. I'm using the ::sparse call with a set of previous features. My main issue is that there seems to be a limit on the window size, and mine is too large. The limit occurs in the pyrlk.cpp file as an assertion: CV_Assert(patch.x > 0 && patch.x < 6 && patch.y > 0 && patch.y < 6);  Where the patch dimensions are determined right above: void calcPatchSize(cv::Size winSize, dim3& block, dim3& patch) { if (winSize.width > 32 && winSize.width > 2 * winSize.height) { block.x = deviceSupports(FEATURE_SET_COMPUTE_12) ? 32 : 16; block.y = 8; } else { block.x = 16; block.y = deviceSupports(FEATURE_SET_COMPUTE_12) ? 16 : 8; } patch.x = (winSize.width + block.x - 1) / block.x; patch.y = (winSize.height + block.y - 1) / block.y; block.z = patch.z = 1; }  My problem is I need a window size of about 100x100 pixels, which is A. why I want to employ GPU acceleration and B. why that seems to not work in OpenCV. :) I'm not familiar with actually implementing GPU acceleration so I am wondering if someone can explain why this limitation exists in OpenCV, if it's a real limitation imposed by the hardware or by the OpenCV implementation, and if there are ways to work around it. It seems odd that this would be a hardware limitation, since these are the situations when you'd want to use a GPU. I can get reasonable speed with smaller search windows but the stabilization is not good enough for the application. I need such a large search window size because I'm calculating the motion to the first (reference) frame. The motion is cyclical plus some small random drift so this method works well, but requires a bit more space to search at the peaks of the cycle when the matching features might be around 30-40 pixels away (at original resolution). This is using OpenCV version 2.4.10 on Linux, built from source for CUDA support.