# Parallel implementation of per pixel calculations

Hi, I am using OpenCV 3.0 .

In my serial implementation I use the Mat_::iterator quite often to access and edit each pixel of an image (As in the example below).

What is the best practice to accelerate these calculations ? Unfortunately I cannot find ways to access particular pixels in a UMat matrix.

Mat createFGMask(Mat &depthMap){

// obtain iterator at initial position
// obtain end position

// loop over all pixels
for ( ; it!= itend; ++it) {
//IF VALUE GREATER THEN 120, SET TO FOREGROUND ( ie. 1):
if( (*it)> 120 ) (*it)=255;
else (*it)= 0;
}



}

A more complex scenario where values of each pixel is required:

Mat pixelShifting(Mat &refImage, Mat &depthValue, ){

Mat warpedImage = Mat::zeros( 768, 1024, CV_8UC3 );

int height = refImage.rows;
int width = refImage.cols;

double newX, newY;
Mat newCoord;

for( int col = 0; col < refImage.cols; ++col ){
for( int row = 0; row < refImage.rows; ++row ){

//FIRST SHIFT BACKGROUND PIXELS:
if( depthValue.at<uchar>(row,col) <= 120 ){

// ALGORITHM THAT COMPUTES NEW PIXEL COORDINATES.
newCoord = calcNewCoord(col, row, depthValues.at<double>(row,col));

newX = newCoord.at<double>(0, 0);
newY = newCoord.at<double>(1, 0);

if( 0<= newY < height && 0<= newX < width ) {

warpedImage.at<Vec3b>(newY, newX) = refImage.at<Vec3b>(row, col);

}
}
}
}
return warpedImage;


}

This code basically takes 2 images, and populates a 3rd image with pixels of image1 by calculating new coordinates using values from image 2.

edit retag close merge delete

Sort by » oldest newest most voted My favorite method for parallelized pixel operations is the TBB library. I think it's also the simplest way (as GPGPU code is hard to implement and to debug).

First write your code in the classical way, using row pointer access:

for(y=0;y<src.rows;y++){
uchar* ps=src.ptr(y);
uchar* pd=dest.ptr(y);
for(int x=0;x<src.cols;x++)
{
pd[x]=doSomethingWith(p[x]);
}
}


Then, change the outer loop to tbb::parallel_for:

#include <tbb/tbb.h>
//...
tbb::parallel_for(0, src.rows, 1, [=](int y) { //changed line
uchar* ps=src.ptr(y);
uchar* pd=dest.ptr(y);
for(int x=0;x<src.cols;x++)
{
pd[x]=doSomethingWith(p[x]);
}
}); //added ); at the end


Now, all the processing of each row will be launched in parallel. You can also read pixels from different lines.

Don't forget to compile it using C++11 (-std=c++11) and link it to libtbb2 (-ltbb).

It won't give you the same boost as a pure GPGPU code (cuda or opencl), but it will use the CPUs to the max and it's much more easy to implement and to debug. It will also work when there is no GPGPU support.

more

this is a plain threshold operation.

stop worrying, and throw away your loop in favor of either:

Mat binary = fgmask > 120;


or :

Mat binary;


in other words, with opencv, the last thing on earth you should do is, write your own per-pixel loops, and then worry about optimizing that. your current way actively defeats opencv's builtin parallelization.

more

Thanks for the very beneficial tip.

But not all my operations are this simple.

I Have cases when i require a pixel(x, y) from Mat A, another pixel(x, y) from Mat B, and input these pixel values into a specific algorithm. Then I store the result into pixel(x, y) of Mat C.

I don't know how to proceed with such complex situations.

Would it be more helpful if I add some example code ?

Thanks

1

Would it be more helpful if I add some example code ?" - yes, definitely.

1

I have edited my question to include a more complex example.

Official site

GitHub

Wiki

Documentation