openmp with sections directive

asked 2016-12-28 09:50:18 -0600

I have a tracking algorithm with two main parts; 1. tracking algorithm 2. video overlay.

A lot of stuff needs to be overlayed and it takes a lot of time. I was thinking of parallelizing the two parts using openMP with minimal effort. So I thought of using the sections directive available in openMP. The following code is just a crude form of what I am trying to achieve:

#include "opencv2\highgui\highgui.hpp"
#include "opencv2\core\core.hpp"
#include "opencv2\imgproc\imgproc.hpp"
#include <iostream>
#include <omp.h>
#include "Timer.h"

using namespace std;
using namespace cv;

int main()
    VideoCapture cap(0);        //start the webcam
    Mat frame, roi;
    Timer t;                    //timer class
    int frameNo = 0;
    double summ = 0;

    while (true)
        cap >> frame;
        roi = frame(Rect(100, 100, 300, 300)).clone();  //extract a deep copy of region of interest; for tracking purposes
        t.start();                                      //start the timer
#pragma omp parallel sections
#pragma omp section         //first section: tracking algorithm
                //some tracking algorithm below which uses only "roi" variable
                GaussianBlur(roi, roi, Size(5, 5), 0, 0, BORDER_REPLICATE);
#pragma omp section         //second section: overlay video
                //a lot of overlay in different video parts which uses only "frame" variable
                putText(frame, "string 1", Point(10, 10), 1, 1, Scalar(1));
                putText(frame, "string 2", Point(20, 20), 1, 1, Scalar(1));
                putText(frame, "string 3", Point(30, 30), 1, 1, Scalar(1));
                putText(frame, "string 4", Point(40, 40), 1, 1, Scalar(1));
                putText(frame, "string 5", Point(50, 50), 1, 1, Scalar(1));
                putText(frame, "string 6", Point(60, 60), 1, 1, Scalar(1));
                putText(frame, "string 7", Point(70, 70), 1, 1, Scalar(1));
                putText(frame, "string 8", Point(80, 80), 1, 1, Scalar(1));
                putText(frame, "string 9", Point(90, 90), 1, 1, Scalar(1));
                putText(frame, "string 10", Point(100, 100), 1, 1, Scalar(1));
        t.stop();               //stop the timer

        summ += t.getElapsedTimeInMilliSec();
        if (frameNo % 10 == 0)      //average total time over 10 frames
            cout << summ / 10 << endl;
            summ = 0;
        imshow("frame", frame);
        if (waitKey(10) == 27)
    return 0;

I don't seem to see a performance boost with timing analysis and in some cases the timing with openMP gets worse even when I am using different variables in my sections

My question is whether I am using the right approach (using sections directive) for my case or is there a better way to parallelize my existing code using openMP with minimal effort?


What do you want to parallelize in your code?

LBerger ( 2016-12-28 10:53:13 -0600 )edit

@LBerger I want the overlay section and the tracking section to run in parallel. Is this not a good approach to reduce the total time spent on a given frame? Would you rather suggest that I parallelize my tracking algorithm itself, e.g. the correlation part in my tracking part?

hyder ( 2016-12-28 10:57:33 -0600 )edit

that's not in your code? Don't forget that opencv code is parallelized. Parallelize a parallelize code is not a good thing.

LBerger ( 2016-12-28 10:59:07 -0600 )edit

@LBerger, as I said this is just a crude form of my code. I am afraid I can't print the entire tracking algorithm code here. Therefore, I have replaced it with a simple instruction (to give an idea about the structure of my code) because I just want the tracking part and the overlay part to run in parallel on different cores rather than them running sequentially on a single core. I am not parallelizing anything within these sections though.

hyder ( 2016-12-28 11:06:56 -0600 )edit

Before parallelize you have to check if code is not already parallelize.

Example I will never diivde image in four parts to parallelize gaussianBlur it is already done

LBerger ( 2016-12-28 11:12:50 -0600 )edit

@LBerger, okay. Is there a list available where I can find the already parallelized openCV's functions and check whether I am using any of them in my code?

hyder ( 2016-12-28 11:16:29 -0600 )edit

No but I think all functions in opencv are parallelized. In opencv_contrib you have to check.

In opencv to parallelized code Class Parallel_loopbody is used. don't forget that opencv used opencl and cuda if you have build opencv from github.

LBerger ( 2016-12-28 11:52:37 -0600 )edit

@LBerger, after your suggestions I tried to find the performance difference with and without openCV. Using the cv::getBuildInformation() function, I can see that Use openMP, Use concurrency and Use openCL options are YES. Also Use IPP has path mentioned against it. All other options in third-party libraries are NO. I have also enabled Use openMP option in VS2013. The problem is when I use openCV functions (e.g. cv::matchTemplate(), cv::GaussianBlur() etc) with and without #pragma omp parallel there is a decrease in performance instead of performance boost. These functions take approximately double the time with the omp pargma. Windows 64 bit platform with VS2013 ultimate. Any idea what I am doing wrong? Thanks.

hyder ( 2017-01-04 22:03:17 -0600 )edit

I don't understand "Use openMP, Use concurrency and Use openCL options are YES" all are parallel libs you can use only one of this libs.

try to set thread number to 0 (only one thread) and disable opencl setUseOpencl(false) to test opencv without optimization

LBerger ( 2017-01-05 12:06:19 -0600 )edit