speeding things up [closed]
Hi all! I've developed an algorithm, but it seems to be too slow to keep up with 60 FPS. It is a background subtraction with computation of average color of around 100 moving objects.
What's already been done to increase speed: 1. no unnecesary imshow(); 2. cv::waitKey(1) every 5th or 10th time; 3. back subtractor is CNT, which known to be faster than MOG and MOG2. 4. no std::cout; 5. compiling with -O3 flag.
Still, with the 1280*720 video @ 60 FPS i got around 40, and maximum of 45. Hardware is Core i3 3.3 GHz loaded about 20%.
Please note that now i use video source, but alter i'll switch to camera, which able of 60fps capturing.
Here is simplified code, just so you get the idea:
int main(int argc, char *argv[])
{
cv::VideoCapture capture("test_video.avi");
cv::Ptr<cv::BackgroundSubtractor> BackgroundSub;
BackgroundSub = cv::bgsegm::createBackgroundSubtractorCNT(500, true,500*50, 0);
int morph_size = 3;
cv::Mat element = cv::getStructuringElement(cv::MORPH_RECT, cv::Size( morph_size + 1, morph_size+1 ), cv::Point( morph_size, morph_size ) );
while (1)
{
capture >> img_in;
img_output = img_in.clone();
img_wholemask = Mat::zeros(img_in.size(), CV_8UC3);
cvtColor(img_in, img_gray, COLOR_BGR2GRAY);
blur(img_gray, img_gray, Size(3,3) );
BackgroundSub->apply(img_gray, img_mog_output, 0.05);
Canny(img_mog_output, img_canny_output, 200.0, 200.0*2);
cv::morphologyEx(img_canny_output, img_morph_out, cv::MORPH_CLOSE, element);
cv::findContours(img_morph_out, contours, hierarchy,cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);
for( int i = 0; i < contours.size(); i++ )
{
cv::drawContours(img_wholemask, contours, i, cv::Scalar(255,255,255), cv::FILLED);
vector<Point2f> mc(contours.size());
vector<Moments> mu(contours.size());
vector<Point2f> centers( contours.size() );
mu[i] = moments(contours[i], false); //contour moments
mc[i] = Point2f( mu[i].m10/mu[i].m00 , mu[i].m01/mu[i].m00 ); //contour centers
Rect roi = boundingRect(contours[i]); //take single contour
img_in(roi).copyTo(img_roi, img_wholemask(roi)); //copy to mat with mask
Scalar avg_color = Scalar(mean(img_roi)) / PartOfColor; //average color in contour
cv::drawContours(cv::FILLED ...); //fill contour
cv::drawContours(...); //draw contour
rectangle(...); //contour bounding box
drawMarker(...); //contour center
}
sprintf(str,"%2.1f FPS", info_fps);
putText(img_output, str, Point(10,30), FONT_HERSHEY_SIMPLEX , 1.0, Scalar(0,0,255), 1);
sprintf(str,"%u contours", contours.size());
putText(img_output, str, Point(10,50), FONT_HERSHEY_SIMPLEX , 0.6, Scalar(0,0,255), 1);
if (!img_output.empty()) imshow("img_output", img_output);
}
counter++;
if (counter==5){
int key = cv::waitKey(1);
counter = 0;
}
}
So is there a way to spped things up a bit? Algo optimistions or something else? I've been thinking about multithreaded capture and processing, which is not much recommended here at forum, AFAIK. Is it applicable in this situation? What to read on muultithreaded C++ opencv applictions? PS: CUDA is not an option since i can't get it to compile, and it's poorly compatible with minGW and CodeBlocks.
Thanks in advance!
could you upload some sample frames. i think sometimes no need to process all frames
sturkmen, unforunately now i can't get real sample frames, since no real revice was built already. It's a free-fall sorter machine, so frame is high-contrast image of falling particles. And IMO, the more FPS = more attempts to analyze particle color, which means more trustworthy results.
I already did some time measurements, and depending on frame, canny filter takes as much time as back subtraction does, and most time is taken by for(contours[i]) cycle.
yea, simply impossible ;(
do you really need 1280x720 resolution for your processing (a simply pyrDown() might help) ?
why do you even need such a high framerate ? (human eye stops processing at 25 fps already)
then, your window will only get updated IF you call waitKey(), so as long as you don't , no need to draw or imshow() anything
berak, actually i'm not sure about the sufficient resolution, just trying to make the code better, faster, stronger (hehe), so when it will be time to bulid real machine operated by code, CV algorithms would not bottleneck anything. with 1024*768 it runs with good 65 FPS, but still, is there any chance to gain speed without pulling down the resolution? multithreading should be avoided at all costs?