# Running 2 algorithms simulatenously

Dear, I want to capture a video and do 2 algorithms in the same to it, where each algorithms take approximately 2 seconds. How to do that? Note: Ive tried to do the 2 algorithms in 2 different processes but the 2nd program telling tat it cant capture from camera due to that the camera is locked by the first process. Thanks

edit retag close merge delete

What do you mean by simultaneously? If you mean to process the same frame in two different and independent ways, then it is as easy as doing this:

VideoCapture cap(*whatever*);
Mat frame1, frame2;
while (true){
cap >> frame1;
frame2 = frame1.clone();
//run first algorithm over frame1
//run second algorithm over frame2
}


You can also run the algorithms in a parallel way rather than in a sequencial way. I think the key to your problem is avoiding reading from the camera twice.

( 2015-10-23 02:42:46 -0500 )edit

Thanks for your comment, So, it is no possible to handle the usb camera from 2 different process in the same time? if not, Any idea how to run the algorithms in a parallel way?

( 2015-10-23 05:20:13 -0500 )edit

I'm not sure you can't do it, but why would you want to? Accesing the camera from two different processes will probably lead to grabbing different frames, and I guess you want to compare the algorithms of something like that. Maybe if you detail more your scenario we can approach better solutions

( 2015-10-23 05:40:49 -0500 )edit

Sort by ยป oldest newest most voted

Although the answer of pklab is correct, I still want to add some comments (and because the comment's length is limited, so I put my comments in the answer part). Firstly, parallel techniques require more work to do for performance goal (high speed). Hence, it should be applied in situations where performance is a must and the computation is heavy. Secondly, your benchmark results are just for small images captured from camera, you did not mention the images' resolution so may be it is not a fair comparison. Here my result (on Duo Core machine 2x2.4, Windows 64, OpenCV 3, VS 2013):

30% CPU used
Video width: 640
Video height: 480
Frame count: 200


With a big video file:

34% CPU used
Video width: 1280
Video height: 720
Frame count: 200


And the last: I rewrited @pklab code to use Mat reference (instead of Mat pointer) and can be run with both file and camera as follow:

#include <thread>
#include <iostream>
#include <string>
#include <opencv2/opencv.hpp>
using namespace cv;
using namespace std;

// here we use canny
void Algo1(const cv::Mat &src, cv::Mat &dst)
{
cvtColor(src, dst, CV_BGR2GRAY);
GaussianBlur(dst, dst, Size(7, 7), 1.5, 1.5);
Canny(dst, dst, 0, 30, 3);
}

// here we use morphology gradient
void Algo2(const cv::Mat &src, cv::Mat & dst)
{
int morph_size = 1;
cv::Size sz(2 * morph_size + 1, 2 * morph_size + 1);
cv::Point anchor(morph_size, morph_size);
Mat element = getStructuringElement(MORPH_RECT, sz, anchor);
}

// empty function to measure overhead
void Test()
{
return;
}

int main(int argc, char * argv[])
{
VideoCapture cap;
if (0==string("0").compare(string(argv[1])))
cap.open(0);  // open the default camera
else
cap.open(argv[1]); // open video file
if (!cap.isOpened()) // check if we succeeded
return -1;
cout << "Video width:" << cap.get(CV_CAP_PROP_FRAME_WIDTH) << endl;
cout << "Video height:" << cap.get(CV_CAP_PROP_FRAME_HEIGHT) << endl;
clock_t  parallel = 0, sequential = 0, testParallel = 0, testSequential = 0;
clock_t start, stop;
int cnt = 0;
for (;cnt<200;)
{
Mat src, dst1, dst2;
cap >> src; // get a new frame from camera
imshow("src", src);

//Try it with sequential way
start = clock();
Algo1(src, dst1);
Algo2(src, dst2);
stop = clock();
sequential += (stop - start);

imshow("Sequential Algo1", dst1);
imshow("Sequential Algo2", dst2);

// try simple parallel processing way
start = clock();
th1.join();
th2.join();
stop = clock();
parallel += (stop - start);

imshow("Paralllel Algo1", dst1);
imshow("Paralllel Algo2", dst2);

int n = 2;
start = clock();
Test();
Test();
stop = clock();
testSequential += (stop - start);

start = clock();
thTest1.join();
thTest2.join();
stop = clock();
testParallel += (stop - start);

cnt++;

if (waitKey(30) >= 0)
break;
}

double parTime = 1000.0*parallel / cnt / (double)CLOCKS_PER_SEC;
double seqTime = 1000.0*sequential / cnt / (double)CLOCKS_PER_SEC;
double overHead = 1000.0*(testParallel - testSequential) / cnt / (double)CLOCKS_PER_SEC;

std::cout << std::endl << "Average processing time (2 ...
more

1- Amazing new performance, what is the reasons for getting this new benchmarks which sowing that parallel is much faster? 2- Just simple question, what do you mean by overhead and the empty function ?

( 2015-10-24 09:57:34 -0500 )edit

1-For showing that parallel is a better approach where it is preferable. 2-That is for showing the overhead (time) you have to pay when applying parallel techniques (in this case, it just the time for initialization threads).

( 2015-10-24 10:10:59 -0500 )edit
1

@tuannhtn, I agree with you in fact my answer focuses on: sequential is better than "simple treading" as is stated at 2nd point of my answer. I would suggest this answer on better thread design

Regards to Mat as reference, in my test, if I use dst as reference sometimes I catch a memory exception.

@thexnightmare, thread could be faster if are well used but will become a nightmare quickly if you don't have a strong treading background

( 2015-10-24 11:15:27 -0500 )edit
1

I'm sorry to have lost my accepted answer... anyway Just to say that treading isn't "the gold" ... on my i3/Win64/OCV 2.4.10/VS2013

Video width: 640
Video height: 480
Frame count: 200


on my PC result shows there is no real difference in performance between. Performance improvement isn't guaranteed and it will depends from many factor from algorithms to hardware architecture. In addiction if in real case the 2 algos have to share some variable ... needed mutex will introduce more delay.

( 2015-10-24 12:31:01 -0500 )edit

@pklab: with OCV 2.4, you may not have good results since it was not built with Intel's IPP package. Hence there is no different between the two approaches in your test.

( 2015-10-24 12:36:23 -0500 )edit

@tuannhtn Why IPP should improve paralles and not sequential ? For infos... on same machine using OCV 3.0.0:

Video width: 640
Video height: 480
Frame count: 200

( 2015-10-24 13:23:33 -0500 )edit

I am sorry @pklab, but I did not mean "IPP should improve paralles and not sequential". What I meant was "I think that with IPP support, the difference between the two approaches is more evident". Thanks for your new results.

( 2015-10-25 02:00:38 -0500 )edit

Thank for you guys, I really appreciate your effots which helped me alot. Just a quick question, If I want to go deeply with threading, which library do you prefer: IPP, OpenMP, or Boost?

( 2015-10-26 15:04:45 -0500 )edit
1

From my experience, I suggest you the Threading Building Blocks (TBB) from Intel.

( 2015-10-26 22:35:26 -0500 )edit

All thing worked perfectly, just one thing, if I want to see the process to the 2 alogirithms, I added imshow to both parallel functions, but then compilation error appears, any idea?

( 2015-12-06 11:29:07 -0500 )edit

You could just use threads to run your algorithms but don't expect faster performance because:

1. OpenCV has a lot of internal parallelization;
2. To go at full speed using threads you need a well designed threading architecture (like producer/consumers) and may be this is out of your scope;

Below is simple example, here I'm comparing sequential vs parallel implementation using a stream from a webcam as input.

I'm showing how to apply 2 different algorithms over same frame, using 2 sequential calls and simple threading. The example below suffering of poor threading implementation because thread construction will introduce big overhead.

On my computer, results show that the sequential way is faster than simple threading, it depends on background computer load, sequential might be up to 2 time faster.

• webcam @320x240, OCV 2.4.10:
• Debug ver within MS VisualStudio 2013: Parallel:16.3ms Sequential:12.8ms Overhead:3.5ms
• Release ver within MS VisualStudio 2013: Parallel:8.1ms Sequential:4.3ms Overhead:4.9ms
• Release ver from command line: Parallel:3.6ms Sequential:2.7ms Overhead:0.6ms
• webcam @640x480, OCV 2.4.10:Parallel:11.65ms Sequential:11.48ms Overhead:0.67ms
• webcam @640x480, OCV 3.0.0:Parallel:8.67ms Sequential:8.37ms Overhead:0.69ms

EDIT2: Considering tuannhtn answer, looks interesting to investigate a bit over different results

For sure advanced parallel programming in IPP improves overall performance but really on Intel i3 I can't see any improvement between sequential and parallel approach. I suppose that difference is due to different processor architecture.

Core Duo 2x2.4 and Intel i3 2x2.53 have 2 cores but CoreDuo doesn't have Hyper-Threading and SmartCache.

When Hyper-Threading is available, some operations share the execution resources automatically in parallel (I/O, cache, bus interface..) on more logical processor. Hyper-Threading and SmartCache make more efficient use of available execution resources boosting sequential approach.

On CoreDuo load balancing on is demanded to developer than parallel approach gets better result.

This can explains why parallel approach is better on CoreDuo but is close to sequential approach on Intel i3. Looking at performance with video 640x480:

• CoreDuo/Ocv3.0.0/Win7/64: Parallel:8.66ms Sequential:13.47ms Overhead:0.6ms
• i3/Ocv3.0.0/Win//64: Parallel:8.67ms Sequential:8.37ms Overhead:0.69ms

the code:

#include <thread>
#include <opencv2/opencv.hpp>
using namespace cv;

// here we use canny
void Algo1(const cv::Mat &src, cv::Mat *dst)
{
cvtColor(src, *dst, CV_BGR2GRAY);
GaussianBlur(*dst, *dst, Size(7, 7), 1.5, 1.5);
Canny(*dst, *dst, 0, 30, 3);
}

// here we use morphology gradient
void Algo2(const cv::Mat &src, cv::Mat *dst)
{
int morph_size = 1;
cv::Size sz(2 * morph_size + 1, 2 * morph_size + 1);
cv::Point anchor(morph_size, morph_size);
Mat element = getStructuringElement(MORPH_RECT, sz, anchor);
}

// empty function to measure overhead
void Test()
{
return;
}

int main()
{
VideoCapture cap ...
more

Just simple question, what do yu mean by overhead and the empty function ?

( 2015-10-24 07:00:10 -0500 )edit

the overhead is due to threads creation and join. I use an empty function to measure just this time

( 2015-10-24 11:17:58 -0500 )edit

Since @pklab updated new results and answer, I made a test on my PC that has a core i3 and below are my results:

First: Webcam input

Video width: 640
Video height: 480
Frame count: 200


Second: Video file input

Video width: 1280
Video height: 720
Frame count: 200

( 2015-10-25 08:59:19 -0500 )edit

@tuannhtn Thank you for nice discussion. Just for info: avoid to pass non const parameters as reference to a std::thread: see here

( 2015-10-26 06:54:58 -0500 )edit

Thanks @pklab, I am indeed aware of that. The code is ok because two threads are independent (only src is common, but it is read only accessed) and there is no race condition. As you emphasized, this situation was very simple, and more complicated contexts will need other tools to control accessing to common data.

( 2015-10-26 09:06:27 -0500 )edit

Official site

GitHub

Wiki

Documentation