You can also use a Kalman filter on your bounding box in order to "smooth" the positions/size of your ROI. That way, you don't have to process the entire sequence to produce the output, you can do it in real-time.
If you don't want to use Kalman, you can also take the previous ROI and mix it with the new ROI:
newROI.x = prevROI.x * 0.3 + newROI.x * 0.7;
newROI.y = prevROI.y * 0.3 + newROI.y * 0.7;
newROI.width = prevROI.width * 0.3 + newROI.width * 0.7;
newROI.height = prevROI.height * 0.3 + newROI.height * 0.7;
Of course, 0.3 and 0.7 can be adjusted as you want (sum should be equal to 1...).
In case you want to use Kalman, you should first init the filter (where ROI is a cv::Rect):
cv::KalmanFilter kalFilter;
kalFilter.init(8,4);
kalFilter.transitionMatrix = (Mat_<float>(8, 8) <<
1,0,0,0,1,0,0,0,
0,1,0,0,0,1,0,0,
0,0,1,0,0,0,1,0,
0,0,0,1,0,0,0,1,
0,0,0,0,1,0,0,0,
0,0,0,0,0,1,0,0,
0,0,0,0,0,0,1,0,
0,0,0,0,0,0,0,1);
kalFilter.statePre.setTo(0);
kalFilter.statePre.at<float>(0) = static_cast<float>(ROI.x);
kalFilter.statePre.at<float>(1) = static_cast<float>(ROI.y);
kalFilter.statePre.at<float>(2) = static_cast<float>(ROI.width);
kalFilter.statePre.at<float>(3) = static_cast<float>(ROI.height);
setIdentity(kalFilter.measurementMatrix);
setIdentity(kalFilter.processNoiseCov, Scalar::all(1e-4));
setIdentity(kalFilter.measurementNoiseCov, Scalar::all(1e-1));
setIdentity(kalFilter.errorCovPost, Scalar::all(.1));
Then, at each new frame, you will filter the new positions:
Mat values = (Mat_<float>(4,1) << newROI.x, newROI.y, newROI.width, newROI.height );
kalFilter.correct(values);//update Kalman...
Mat kalmanEstim = kalFilter.predict();//get the filtered position
filteredROI.x = static_cast<int>(kalmanEstim.at<float>(0));
filteredROI.y = static_cast<int>(kalmanEstim.at<float>(1));
filteredROI.width = static_cast<int>(kalmanEstim.at<float>(2));
filteredROI.height = static_cast<int>(kalmanEstim.at<float>(3));
I hope you will find this useful!
I think that you can do some kind of distortion detection and transform the image that you are using for replacing the ROI to seems more likely the one in the real video, but I have no idea how to do it, I think with some features and based on their places related to the others.
Hmm that 'jitter' is indeed caused by the cascade classifier detector, to solve this problem first store all your detections. Then perform temporal smoothing, which means you average detections over a loop of time/frames. That will reduce the errors drastically.
Hi Steve, Thanks for your suggestion. After posting this questions, I started buffering my video for x-frames (say amounting to 10 secs , I then find the average of the ROI for the x frames and then overlay in that. This reduces the jitteriness for that x frames.. but, then I start buffering from x+1 to 2x frames and overlay, the transition from the first to the second clip shows an obvious JUMP as the ROI of the (x+1)th frame is higher than the average of the first x frames. So, trying to see how better to smooth that out.. In the source video, the camera zooms in on my ROI. So, the matching rectangle size keeps increasing.In general, using opencv, if I overlay an image (that zooms/grows every frame), this behavior is expected? Would using OPENGL to render the overlay image be any better?
The solution beneath will solve that!