Parallel implementation of per pixel calculations

asked 2016-03-17 12:11:15 -0600

mark92m
46 ●1 ●3 ●6

updated 2016-03-17 13:12:36 -0600

Hi, I am using OpenCV 3.0 .

In my serial implementation I use the Mat_::iterator quite often to access and edit each pixel of an image (As in the example below).

What is the best practice to accelerate these calculations ? Unfortunately I cannot find ways to access particular pixels in a UMat matrix.

Mat createFGMask(Mat &depthMap){
Mat fgMask = depthMap.clone();

// obtain iterator at initial position
Mat_<uchar>::iterator it= fgMask.begin<uchar>();
// obtain end position
Mat_<uchar>::iterator itend= fgMask.end<uchar>();

// loop over all pixels
for ( ; it!= itend; ++it) {
    //IF VALUE GREATER THEN 120, SET TO FOREGROUND ( ie. 1):
    if( (*it)> 120 ) (*it)=255;
    else (*it)= 0;
}

return fgMask;

}

A more complex scenario where values of each pixel is required:

Mat pixelShifting(Mat &refImage, Mat &depthValue, ){

Mat warpedImage = Mat::zeros( 768, 1024, CV_8UC3 );

int height = refImage.rows;
int width = refImage.cols;

double newX, newY;
Mat newCoord;


for( int col = 0; col < refImage.cols; ++col ){
    for( int row = 0; row < refImage.rows; ++row ){

        //FIRST SHIFT BACKGROUND PIXELS:
        if( depthValue.at<uchar>(row,col) <= 120 ){

              // ALGORITHM THAT COMPUTES NEW PIXEL COORDINATES.
            newCoord = calcNewCoord(col, row, depthValues.at<double>(row,col));                                                                                  

            newX = newCoord.at<double>(0, 0);
            newY = newCoord.at<double>(1, 0);

            if( 0<= newY < height && 0<= newX < width ) {

                warpedImage.at<Vec3b>(newY, newX) = refImage.at<Vec3b>(row, col);

            }
        }
    }
}
return warpedImage;

}

This code basically takes 2 images, and populates a 3rd image with pixels of image1 by calculating new coordinates using values from image 2.

edit retag flag offensive close merge delete

add a comment

2 answers

Sort by » oldest newest most voted

answered 2016-03-17 16:43:16 -0600

kbarni

4134 ●1 ●13 ●49

updated 2016-03-17 16:47:08 -0600

My favorite method for parallelized pixel operations is the TBB library. I think it's also the simplest way (as GPGPU code is hard to implement and to debug).

First write your code in the classical way, using row pointer access:

for(y=0;y<src.rows;y++){
    uchar* ps=src.ptr(y);
    uchar* pd=dest.ptr(y);
    for(int x=0;x<src.cols;x++)
    {
        pd[x]=doSomethingWith(p[x]);
    }
}

Then, change the outer loop to tbb::parallel_for:

#include <tbb/tbb.h>
//...
tbb::parallel_for(0, src.rows, 1, [=](int y) { //changed line
    uchar* ps=src.ptr(y);
    uchar* pd=dest.ptr(y);
    for(int x=0;x<src.cols;x++)
    {
        pd[x]=doSomethingWith(p[x]);
    }
}); //added ); at the end

Now, all the processing of each row will be launched in parallel. You can also read pixels from different lines.

Don't forget to compile it using C++11 (-std=c++11) and link it to libtbb2 (-ltbb).

It won't give you the same boost as a pure GPGPU code (cuda or opencl), but it will use the CPUs to the max and it's much more easy to implement and to debug. It will also work when there is no GPGPU support.

edit flag offensive delete link

add a comment

answered 2016-03-17 12:22:12 -0600

berak
32993 ●7 ●81 ●312

updated 2016-03-17 12:30:25 -0600

this is a plain threshold operation.

stop worrying, and throw away your loop in favor of either:

Mat binary = fgmask > 120;

or :

Mat binary;
threshold(fgmask, binary, 120, 255, 0);

in other words, with opencv, the last thing on earth you should do is, write your own per-pixel loops, and then worry about optimizing that. your current way actively defeats opencv's builtin parallelization.

edit flag offensive delete link

Comments

Thanks for the very beneficial tip.

But not all my operations are this simple.

I Have cases when i require a pixel(x, y) from Mat A, another pixel(x, y) from Mat B, and input these pixel values into a specific algorithm. Then I store the result into pixel(x, y) of Mat C.

I don't know how to proceed with such complex situations.

Would it be more helpful if I add some example code ?

Thanks

mark92m ( 2016-03-17 12:43:13 -0600 )edit

Would it be more helpful if I add some example code ?" - yes, definitely.

berak ( 2016-03-17 12:53:22 -0600 )edit

I have edited my question to include a more complex example.

mark92m ( 2016-03-17 13:10:25 -0600 )edit

add a comment

Parallel implementation of per pixel calculations

2 answers

Comments

Links

Question Tools

Stats

Related questions

Parallel implementation of per pixel calculations edit

2 answers

Comments

Links

Question Tools

Stats

Related questions

Parallel implementation of per pixel calculations