Ask Your Question

Revision history [back]

OpenCV 3.0 parallel_for_

Hello,

I've implemented a parallel image descriptor extractor using opencv 3.0 parallel_for_ and BRISK descriptor extractor.

What I do is to split the image into horizontal stripes and compute the image descriptors in parallel. This is the outer call:

_extractor.init(img, featureCount); cv::parallel_for_(cv::Range(0, _processorCount), _extractor); // _processorCount == 4 _extractor.buildFinal(keyPoints, features);

Then, for each thread I call _featureExtractor->detectAndCompute(_image, keyPoints, mask); with a proper initialized horizontal mask stripe.

However, this implementation doesn't run faster than the serial implementation. It runs 2x slower. I did some debugging and saw opencv uses Microsoft concurrency framework. Moreover, when printing out the called range and getThreadNum() I get this:

range[0,1] thread [2] range[3,4] thread [2] range[2,3] thread [2] range[1,2] thread [2]

That means that my inner code runs on a single thread, 4 times like a serial implementation. Do you know what's wrong with my approach?

Thank you, Alin

OpenCV 3.0 parallel_for_

Hello,

I've implemented a parallel image descriptor extractor using opencv 3.0 parallel_for_ and BRISK descriptor extractor.

What I do is to split the image into horizontal stripes and compute the image descriptors in parallel. This is the outer call:

_extractor.init(img, featureCount); featureCount);
cv::parallel_for_(cv::Range(0, _processorCount), _extractor); // _processorCount == 4 _extractor.buildFinal(keyPoints, features);

Then, for each thread I call _featureExtractor->detectAndCompute(_image, mask, keyPoints, mask); features); with a proper initialized horizontal mask stripe.

However, this implementation doesn't run faster than the serial implementation. It runs 2x slower. I did some debugging and saw opencv uses Microsoft concurrency framework. Moreover, when printing out the called range and getThreadNum() I get this:

range[0,1] thread [2] [2]
range[3,4] thread [2] [2]
range[2,3] thread [2] [2]
range[1,2] thread [2][2]

That means that my inner code runs on a single thread, 4 times like a serial implementation. Do you know what's wrong with my approach?

Thank you, Alin

OpenCV 3.0 parallel_for_

Hello,

I've implemented a parallel image descriptor extractor using opencv 3.0 parallel_for_ and BRISK descriptor extractor.

What I do is to split the image into horizontal stripes and compute the image descriptors in parallel. This is the outer call:

_extractor.init(img, featureCount);
cv::parallel_for_(cv::Range(0, _processorCount), _extractor); // _processorCount == 4 _extractor.buildFinal(keyPoints, features);

Then, for each thread I call _featureExtractor->detectAndCompute(_image, mask, keyPoints, features); with a proper initialized horizontal mask stripe.

However, this implementation doesn't run faster than the serial implementation. It runs 2x slower. slower (23 ms for serial and 40+ ms for parallel). I did some debugging and saw opencv uses Microsoft concurrency framework. Moreover, when printing out the called range and getThreadNum() I get this:

range[0,1] thread [2]
range[3,4] thread [2]
range[2,3] thread [2]
range[1,2] thread [2]

That means that my inner code runs on a single thread, 4 times like a serial implementation. Do you know what's wrong with my approach?

Thank you, Alin