Revision history - OpenCV Q&A Forum

My favorite method for parallelized pixel operations is the TBB library. I think it's also the simplest (as GPGPU code is very hard to implement and to debug).

First write your code in the classical way, using row pointer access:

for(y=0;y<image.rows;y++){
    uchar* p=image.ptr(y);
    for(int x=0;x<image.cols;x++)
    {
        doSomethingWith(p[x]);
    }
}

Then, change the outer loop to tbb::parallel_for:

#include <tbb/tbb.h>
//...
tbb::parallel_for(0, image.rows, 1, [=](int y) { //changed line
    uchar* p=image.ptr(y);
    for(int x=0;x<image.cols;x++)
    {
        doSomethingWith(p[x]);
    }
}); //added ); at the end

Now, all the processing of each row will be launched in parallel.

Don't forget to compile it using C++11 (-std=c++11) and link it to libtbb2 (-ltbb).

It won't give you the same boost as a pure GPGPU code (cuda or opencl), but it will use the CPUs to the max and it's much more easy to implement and to debug. It will also work when there is no GPGPU support.

My favorite method for parallelized pixel operations is the TBB library. I think it's also the simplest (as GPGPU code is ~~very~~ hard to implement and to debug).

First write your code in the classical way, using row pointer access:

for(y=0;y<image.rows;y++){
    uchar* p=image.ptr(y);
for(y=0;y<src.rows;y++){
    uchar* ps=src.ptr(y);
    uchar* pd=dest.ptr(y);
    for(int x=0;x<image.cols;x++)
x=0;x<src.cols;x++)
    {
        doSomethingWith(p[x]);
pd[x]=doSomethingWith(p[x]);
    }
}

Then, change the outer loop to tbb::parallel_for:

#include <tbb/tbb.h>
//...
tbb::parallel_for(0, image.rows, src.rows, 1, [=](int y) { //changed line
    uchar* p=image.ptr(y);
ps=src.ptr(y);
    uchar* pd=dest.ptr(y);
    for(int x=0;x<image.cols;x++)
x=0;x<src.cols;x++)
    {
        doSomethingWith(p[x]);
pd[x]=doSomethingWith(p[x]);
    }
}); //added ); at the end

Now, all the processing of each row will be launched in parallel.

Don't forget to compile it using C++11 (-std=c++11) and link it to libtbb2 (-ltbb).

It won't give you the same boost as a pure GPGPU code (cuda or opencl), but it will use the CPUs to the max and it's much more easy to implement and to debug. It will also work when there is no GPGPU support.

My favorite method for parallelized pixel operations is the TBB library. I think it's also the simplest way (as GPGPU code is hard to implement and to debug).

First write your code in the classical way, using row pointer access:

for(y=0;y<src.rows;y++){
    uchar* ps=src.ptr(y);
    uchar* pd=dest.ptr(y);
    for(int x=0;x<src.cols;x++)
    {
        pd[x]=doSomethingWith(p[x]);
    }
}

Then, change the outer loop to tbb::parallel_for:

#include <tbb/tbb.h>
//...
tbb::parallel_for(0, src.rows, 1, [=](int y) { //changed line
    uchar* ps=src.ptr(y);
    uchar* pd=dest.ptr(y);
    for(int x=0;x<src.cols;x++)
    {
        pd[x]=doSomethingWith(p[x]);
    }
}); //added ); at the end

Now, all the processing of each row will be launched in ~~parallel.~~parallel. You can also read pixels from different lines.

Don't forget to compile it using C++11 (-std=c++11) and link it to libtbb2 (-ltbb).

It won't give you the same boost as a pure GPGPU code (cuda or opencl), but it will use the CPUs to the max and it's much more easy to implement and to debug. It will also work when there is no GPGPU support.

Revision history [back]