Ask Your Question

opencv two algorithms running in parallel

asked 2019-02-24 09:18:24 -0500

Samuel123 gravatar image

Thanks for reading this post. My program is written in c++ run in visual studio 2017, opencv version 4.0.1 build with tbb and mkl. I am trying to run two instances of similar combination of opencv functions resize, thresholding, morphology open and close and lastly a findcontour. my application scenario is that I capture two frames from two cameras and trying to process them in parallel. One frame when run individually takes about 9 ms to finish the processing, two frames when run sequentially takes 17 ms to process. but i implement this code in parallel using std::thread, processing time doesn't improves but actually adds 1ms of thread creating overhead to it, so it takes 18ms to finish. i have tried the boost library but the results were similar to the std::thread. When i implement tbb task groups, while there is no task creating overhead but the processing time still stays 17ms. I have provided the codes below. I am wondering if there is something i am doing wrong or if this behavior is normal, this kind of processing. Because my expectations where that the process time will decrease to somewhat 9-12 ms while running the code in parallel. but this doesn't work that way.

using std:: thread

    void find_c(const Mat &im, Mat &im_c)
    Mat imm0;
    resize(im, imm0, Size(), 0.3, 0.3, INTER_NEAREST); // downscale 2x on both x and y
    Mat d1 = Mat::zeros(Size(imm0.cols, imm0.rows), CV_8UC1);
    int pic_width = imm0.cols; // width of the resized image for further calculation
    d1(Range(470, 470 + 100), Range(0, d1.cols)) = 255;
    im_c = imm0.clone();
    Mat imm1;
    bitwise_and(imm0, d1, imm1);
    Mat imm2;
    threshold(imm1, imm2, 100, 255, THRESH_BINARY);
    Mat imm3;
    morphologyEx(imm2, imm3, MORPH_CLOSE, Mat(), Point(-1, -1), 2);
    Mat imm4;
    morphologyEx(imm3, imm4, MORPH_OPEN, Mat(), Point(-1, -1), 2);
    vector<vector<Point> > contours;
    vector<Vec4i> hierarchy;
    findContours(imm4, contours, hierarchy, RETR_LIST, CHAIN_APPROX_SIMPLE, Point(0, 0));
    // for the circle rectangle and other info
    vector<vector<Point> > contours_poly(contours.size());
    vector<Rect> boundRect(contours.size());

    for (int i = 0; i < contours.size(); i++)
        approxPolyDP(Mat(contours[i]), contours_poly[i], 3, true);
        //boundRect[i] = boundingRect(Mat(contours_poly[i]));
        minEnclosingCircle((Mat)contours_poly[i], center[i], radius[i]);

    /// Draw contours
    //Mat drawing = Mat::zeros(gray.size(), CV_8UC3);
    for (int i = 0; i < contours.size(); i++)
        circle(im_c, center[i], (int)radius[i] * 1.5, Scalar(255, 255, 255), 2, 8, 0);
        cout << "center of the contour No." << i + 1 << "=" << center[i] << endl;


    void find_cx(const Mat &immc, Mat &im_x)

Mat imp0;
resize(immc, imp0, Size(), 0.3, 0.3, INTER_NEAREST);

Mat d2 = Mat::zeros(Size(imp0.cols, imp0.rows), CV_8UC1);
int pic_width = imp0.cols; // width of the resized image for further calculation
d2(Range(270, 270 + 200), Range(0, d2.cols)) = 255;

im_x = imp0.clone();

Mat imp1;
bitwise_and(imp0, d2, imp1);
Mat imp2;
threshold(imp1, imp2, 100, 255, THRESH_BINARY);
Mat imp3;
morphologyEx(imp2, imp3, MORPH_CLOSE ...
edit retag flag offensive close merge delete


Because my expectations where that the process time will decrease to somewhat 9-12 ms while running the code in parallel. but this doesn't work that way.

opencv code is highly parallelized internally already, and creating threads is not for free, either.

berak gravatar imageberak ( 2019-02-25 01:25:36 -0500 )edit

Posting a minimal code would make it easier to read and understand what are you trying to achieve...

kbarni gravatar imagekbarni ( 2019-02-25 04:05:17 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2019-02-25 04:01:51 -0500

kbarni gravatar image

The clock() function measures the CPU ticks (operations) needed to do the processing and divides it with the processor frequency.

It is the same for parallel and serial operations - but it doesn't measure the real elapsed time in multi-core systems.

Use the standard chrono library to measure precisely the processing time (you should enable C++11).

#include <chrono>

std::chrono::time_point<std::chrono::system_clock> start, end;
std::chrono::duration<double> elapsed;
start = std::chrono::system_clock::now();
// the processing here...
end = std::chrono::system_clock::now();
cout<<"Processing time: "<<elapsed.count();
edit flag offensive delete link more

Question Tools

1 follower


Asked: 2019-02-24 09:18:24 -0500

Seen: 415 times

Last updated: Feb 25 '19