speeding things up

asked 2018-12-17 09:09:31 -0500

ub0baa gravatar image

Hi all! I've developed an algorithm, but it seems to be too slow to keep up with 60 FPS. It is a background subtraction with computation of average color of around 100 moving objects.

What's already been done to increase speed: 1. no unnecesary imshow(); 2. cv::waitKey(1) every 5th or 10th time; 3. back subtractor is CNT, which known to be faster than MOG and MOG2. 4. no std::cout; 5. compiling with -O3 flag.

Still, with the 1280*720 video @ 60 FPS i got around 40, and maximum of 45. Hardware is Core i3 3.3 GHz loaded about 20%.

Please note that now i use video source, but alter i'll switch to camera, which able of 60fps capturing.

Here is simplified code, just so you get the idea:

int main(int argc, char *argv[])
cv::VideoCapture capture("test_video.avi");

cv::Ptr<cv::BackgroundSubtractor> BackgroundSub;
BackgroundSub = cv::bgsegm::createBackgroundSubtractorCNT(500, true,500*50, 0);

int morph_size = 3;
cv::Mat element = cv::getStructuringElement(cv::MORPH_RECT, cv::Size( morph_size + 1, morph_size+1 ), cv::Point( morph_size, morph_size ) );

while (1)
    capture >> img_in;
    img_output =  img_in.clone();
    img_wholemask = Mat::zeros(img_in.size(), CV_8UC3);

    cvtColor(img_in, img_gray, COLOR_BGR2GRAY);
    blur(img_gray, img_gray, Size(3,3) );
    BackgroundSub->apply(img_gray, img_mog_output, 0.05);
    Canny(img_mog_output, img_canny_output, 200.0, 200.0*2);
    cv::morphologyEx(img_canny_output, img_morph_out, cv::MORPH_CLOSE, element);
    cv::findContours(img_morph_out, contours, hierarchy,cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);

    for( int i = 0; i < contours.size(); i++ )
        cv::drawContours(img_wholemask, contours, i, cv::Scalar(255,255,255), cv::FILLED);
        vector<Point2f> mc(contours.size());
        vector<Moments> mu(contours.size());
        vector<Point2f> centers( contours.size() );
        mu[i] = moments(contours[i], false);    //contour moments 
        mc[i] = Point2f( mu[i].m10/mu[i].m00 , mu[i].m01/mu[i].m00 );   //contour centers
        Rect roi = boundingRect(contours[i]);   //take single contour
        img_in(roi).copyTo(img_roi, img_wholemask(roi));    //copy to mat with mask
        Scalar avg_color = Scalar(mean(img_roi)) / PartOfColor; //average color in contour
        cv::drawContours(cv::FILLED ...); //fill contour
        cv::drawContours(...);  //draw contour
        rectangle(...); //contour bounding box
        drawMarker(...); //contour center

    sprintf(str,"%2.1f FPS", info_fps);
    putText(img_output, str, Point(10,30), FONT_HERSHEY_SIMPLEX   , 1.0, Scalar(0,0,255), 1);
    sprintf(str,"%u contours", contours.size());
    putText(img_output, str, Point(10,50), FONT_HERSHEY_SIMPLEX   , 0.6, Scalar(0,0,255), 1);

    if (!img_output.empty()) imshow("img_output", img_output);


    if (counter==5){
        int key = cv::waitKey(1);
        counter = 0;

So is there a way to spped things up a bit? Algo optimistions or something else? I've been thinking about multithreaded capture and processing, which is not much recommended here at forum, AFAIK. Is it applicable in this situation? What to read on muultithreaded C++ opencv applictions? PS: CUDA is not an option since i can't get it to compile, and it's poorly compatible with minGW and CodeBlocks.

Thanks in advance!

edit retag flag offensive close merge delete


could you upload some sample frames. i think sometimes no need to process all frames

sturkmen gravatar imagesturkmen ( 2018-12-17 09:21:37 -0500 )edit

sturkmen, unforunately now i can't get real sample frames, since no real revice was built already. It's a free-fall sorter machine, so frame is high-contrast image of falling particles. And IMO, the more FPS = more attempts to analyze particle color, which means more trustworthy results.

I already did some time measurements, and depending on frame, canny filter takes as much time as back subtraction does, and most time is taken by for(contours[i]) cycle.

ub0baa gravatar imageub0baa ( 2018-12-17 09:58:54 -0500 )edit

CUDA is not an option since i can't get it to compile, and it's poorly compatible with minGW and CodeBlocks.

yea, simply impossible ;(

do you really need 1280x720 resolution for your processing (a simply pyrDown() might help) ?

why do you even need such a high framerate ? (human eye stops processing at 25 fps already)

then, your window will only get updated IF you call waitKey(), so as long as you don't , no need to draw or imshow() anything

berak gravatar imageberak ( 2018-12-17 10:02:18 -0500 )edit

berak, actually i'm not sure about the sufficient resolution, just trying to make the code better, faster, stronger (hehe), so when it will be time to bulid real machine operated by code, CV algorithms would not bottleneck anything. with 1024*768 it runs with good 65 FPS, but still, is there any chance to gain speed without pulling down the resolution? multithreading should be avoided at all costs?

ub0baa gravatar imageub0baa ( 2018-12-17 10:29:02 -0500 )edit