openmp with sections directive

asked 2016-12-28 09:50:18 -0600

hyder gravatar image

Hi,

I have a tracking algorithm with two main parts; 1. tracking algorithm 2. video overlay.

A lot of stuff needs to be overlayed and it takes a lot of time. I was thinking of parallelizing the two parts using openMP with minimal effort. So I thought of using the sections directive available in openMP. The following code is just a crude form of what I am trying to achieve:

#include "opencv2\highgui\highgui.hpp"
#include "opencv2\core\core.hpp"
#include "opencv2\imgproc\imgproc.hpp"
#include <iostream>
#include <omp.h>
#include "Timer.h"

using namespace std;
using namespace cv;

int main()
{
    VideoCapture cap(0);        //start the webcam
    Mat frame, roi;
    Timer t;                    //timer class
    int frameNo = 0;
    double summ = 0;

    while (true)
    {
        cap >> frame;
        frameNo++;
        roi = frame(Rect(100, 100, 300, 300)).clone();  //extract a deep copy of region of interest; for tracking purposes
        t.start();                                      //start the timer
#pragma omp parallel sections
        {
#pragma omp section         //first section: tracking algorithm
            {
                //some tracking algorithm below which uses only "roi" variable
                GaussianBlur(roi, roi, Size(5, 5), 0, 0, BORDER_REPLICATE);
            }
#pragma omp section         //second section: overlay video
            {
                //a lot of overlay in different video parts which uses only "frame" variable
                putText(frame, "string 1", Point(10, 10), 1, 1, Scalar(1));
                putText(frame, "string 2", Point(20, 20), 1, 1, Scalar(1));
                putText(frame, "string 3", Point(30, 30), 1, 1, Scalar(1));
                putText(frame, "string 4", Point(40, 40), 1, 1, Scalar(1));
                putText(frame, "string 5", Point(50, 50), 1, 1, Scalar(1));
                putText(frame, "string 6", Point(60, 60), 1, 1, Scalar(1));
                putText(frame, "string 7", Point(70, 70), 1, 1, Scalar(1));
                putText(frame, "string 8", Point(80, 80), 1, 1, Scalar(1));
                putText(frame, "string 9", Point(90, 90), 1, 1, Scalar(1));
                putText(frame, "string 10", Point(100, 100), 1, 1, Scalar(1));
            }
        }
        t.stop();               //stop the timer

        summ += t.getElapsedTimeInMilliSec();
        if (frameNo % 10 == 0)      //average total time over 10 frames
        {
            cout << summ / 10 << endl;
            summ = 0;
        }
        imshow("frame", frame);
        if (waitKey(10) == 27)
            break;
    }
    return 0;
}

I don't seem to see a performance boost with timing analysis and in some cases the timing with openMP gets worse even when I am using different variables in my sections

My question is whether I am using the right approach (using sections directive) for my case or is there a better way to parallelize my existing code using openMP with minimal effort?

Thanks.

edit retag flag offensive close merge delete

Comments

What do you want to parallelize in your code?

LBerger gravatar imageLBerger ( 2016-12-28 10:53:13 -0600 )edit

@LBerger I want the overlay section and the tracking section to run in parallel. Is this not a good approach to reduce the total time spent on a given frame? Would you rather suggest that I parallelize my tracking algorithm itself, e.g. the correlation part in my tracking part?

hyder gravatar imagehyder ( 2016-12-28 10:57:33 -0600 )edit

that's not in your code? Don't forget that opencv code is parallelized. Parallelize a parallelize code is not a good thing.

LBerger gravatar imageLBerger ( 2016-12-28 10:59:07 -0600 )edit

@LBerger, as I said this is just a crude form of my code. I am afraid I can't print the entire tracking algorithm code here. Therefore, I have replaced it with a simple instruction (to give an idea about the structure of my code) because I just want the tracking part and the overlay part to run in parallel on different cores rather than them running sequentially on a single core. I am not parallelizing anything within these sections though.

hyder gravatar imagehyder ( 2016-12-28 11:06:56 -0600 )edit

Before parallelize you have to check if code is not already parallelize.

Example I will never diivde image in four parts to parallelize gaussianBlur it is already done

LBerger gravatar imageLBerger ( 2016-12-28 11:12:50 -0600 )edit

@LBerger, okay. Is there a list available where I can find the already parallelized openCV's functions and check whether I am using any of them in my code?

hyder gravatar imagehyder ( 2016-12-28 11:16:29 -0600 )edit

No but I think all functions in opencv are parallelized. In opencv_contrib you have to check.

In opencv to parallelized code Class Parallel_loopbody is used. don't forget that opencv used opencl and cuda if you have build opencv from github.

LBerger gravatar imageLBerger ( 2016-12-28 11:52:37 -0600 )edit

@LBerger, after your suggestions I tried to find the performance difference with and without openCV. Using the cv::getBuildInformation() function, I can see that Use openMP, Use concurrency and Use openCL options are YES. Also Use IPP has path mentioned against it. All other options in third-party libraries are NO. I have also enabled Use openMP option in VS2013. The problem is when I use openCV functions (e.g. cv::matchTemplate(), cv::GaussianBlur() etc) with and without #pragma omp parallel there is a decrease in performance instead of performance boost. These functions take approximately double the time with the omp pargma. Windows 64 bit platform with VS2013 ultimate. Any idea what I am doing wrong? Thanks.

hyder gravatar imagehyder ( 2017-01-04 22:03:17 -0600 )edit

I don't understand "Use openMP, Use concurrency and Use openCL options are YES" all are parallel libs you can use only one of this libs.

try to set thread number to 0 (only one thread) and disable opencl setUseOpencl(false) to test opencv without optimization

LBerger gravatar imageLBerger ( 2017-01-05 12:06:19 -0600 )edit