opencv two algorithms running in parallel

asked 2019-02-24 09:18:24 -0600

Thanks for reading this post. My program is written in c++ run in visual studio 2017, opencv version 4.0.1 build with tbb and mkl. I am trying to run two instances of similar combination of opencv functions resize, thresholding, morphology open and close and lastly a findcontour. my application scenario is that I capture two frames from two cameras and trying to process them in parallel. One frame when run individually takes about 9 ms to finish the processing, two frames when run sequentially takes 17 ms to process. but i implement this code in parallel using std::thread, processing time doesn't improves but actually adds 1ms of thread creating overhead to it, so it takes 18ms to finish. i have tried the boost library but the results were similar to the std::thread. When i implement tbb task groups, while there is no task creating overhead but the processing time still stays 17ms. I have provided the codes below. I am wondering if there is something i am doing wrong or if this behavior is normal, this kind of processing. Because my expectations where that the process time will decrease to somewhat 9-12 ms while running the code in parallel. but this doesn't work that way.

using std:: thread

    void find_c(const Mat &im, Mat &im_c)
 {  
    Mat imm0;
    resize(im, imm0, Size(), 0.3, 0.3, INTER_NEAREST); // downscale 2x on both x and y
    Mat d1 = Mat::zeros(Size(imm0.cols, imm0.rows), CV_8UC1);
    int pic_width = imm0.cols; // width of the resized image for further calculation
    d1(Range(470, 470 + 100), Range(0, d1.cols)) = 255;
    im_c = imm0.clone();
    Mat imm1;
    bitwise_and(imm0, d1, imm1);
    Mat imm2;
    threshold(imm1, imm2, 100, 255, THRESH_BINARY);
    Mat imm3;
    morphologyEx(imm2, imm3, MORPH_CLOSE, Mat(), Point(-1, -1), 2);
    Mat imm4;
    morphologyEx(imm3, imm4, MORPH_OPEN, Mat(), Point(-1, -1), 2);
    vector<vector<Point> > contours;
    vector<Vec4i> hierarchy;
    findContours(imm4, contours, hierarchy, RETR_LIST, CHAIN_APPROX_SIMPLE, Point(0, 0));
    // for the circle rectangle and other info
    vector<vector<Point> > contours_poly(contours.size());
    vector<Rect> boundRect(contours.size());
    vector<Point2f>center(contours.size());
    vector<float>radius(contours.size());

    for (int i = 0; i < contours.size(); i++)
    {
        approxPolyDP(Mat(contours[i]), contours_poly[i], 3, true);
        //boundRect[i] = boundingRect(Mat(contours_poly[i]));
        minEnclosingCircle((Mat)contours_poly[i], center[i], radius[i]);
    }

    /// Draw contours
    //Mat drawing = Mat::zeros(gray.size(), CV_8UC3);
    for (int i = 0; i < contours.size(); i++)
    {
        circle(im_c, center[i], (int)radius[i] * 1.5, Scalar(255, 255, 255), 2, 8, 0);
        cout << "center of the contour No." << i + 1 << "=" << center[i] << endl;
    }

  }

    void find_cx(const Mat &immc, Mat &im_x)
   {


Mat imp0;
resize(immc, imp0, Size(), 0.3, 0.3, INTER_NEAREST);

Mat d2 = Mat::zeros(Size(imp0.cols, imp0.rows), CV_8UC1);
int pic_width = imp0.cols; // width of the resized image for further calculation
d2(Range(270, 270 + 200), Range(0, d2.cols)) = 255;

im_x = imp0.clone();

Mat imp1;
bitwise_and(imp0, d2, imp1);
Mat imp2;
threshold(imp1, imp2, 100, 255, THRESH_BINARY);
Mat imp3;
morphologyEx(imp2, imp3, MORPH_CLOSE ...

(more)

edit retag flag offensive close merge delete

Comments

Because my expectations where that the process time will decrease to somewhat 9-12 ms while running the code in parallel. but this doesn't work that way.

opencv code is highly parallelized internally already, and creating threads is not for free, either.

berak ( 2019-02-25 01:25:36 -0600 )edit

Posting a minimal code would make it easier to read and understand what are you trying to achieve...

kbarni ( 2019-02-25 04:05:17 -0600 )edit

add a comment

answered 2019-02-25 04:01:51 -0600

kbarni

4134 ●1 ●13 ●49

The clock() function measures the CPU ticks (operations) needed to do the processing and divides it with the processor frequency.

It is the same for parallel and serial operations - but it doesn't measure the real elapsed time in multi-core systems.

Use the standard chrono library to measure precisely the processing time (you should enable C++11).

#include <chrono>

//....
std::chrono::time_point<std::chrono::system_clock> start, end;
std::chrono::duration<double> elapsed;
start = std::chrono::system_clock::now();
//...do the processing here...
end = std::chrono::system_clock::now();
elapsed=end-start;
cout<<"Processing time: "<<elapsed.count();

edit flag offensive delete link

add a comment

opencv two algorithms running in parallel

using std:: thread

Comments

1 answer

Links

Question Tools

Stats

Related questions

opencv two algorithms running in parallel edit

using std:: thread

Comments

1 answer

Links

Question Tools

Stats

Related questions

opencv two algorithms running in parallel