opencv much slower in multithreading

asked 2015-08-04 14:36:18 -0600

Dum gravatar image

updated 2015-08-05 04:59:05 -0600

Im writting a console application that uses open cv and multithreading. Im testing it in a 4 native core CPU (8 with HT enabled) with 12 GB of RAM. Each thread has to execute a function that uses opencv calls. In that case, the time is much bigger executing in parallel several threads than the run time obtained for a single thread. One expects that the time is more or less the same independently the number of threads or increased about 10%, but the run time for each thread increases according to the number of threads used. The more threads, the more the time for each one.

I have test cv::setnumThreads(8) and cv::setnumThreads(0) with the same result. If the function is replaced by other function with some own dummy filters over data, the behaviour is the expected one, all threads ends with the same run time independently of the numbre of threads. Does opencv functions block the threads or do some sequential operations that blok threads???. The picture below shows the times obtained in the application: image description

Time in file No. 3 --> means the total time to process a sequence of images (768) in miliseconds,

I attach a sample c++ project to test this behaviour. This application executes secuentially 1 to 8 threads. Does anybody know what is happening?. I don`t know what else to do...

Thanks.

EDIT. Here is the code. I cannot attach a ready to use project to the post. It is in the link of the previous comment. The test image is Test.jpg. The original used is in bmp format.

#include "stdafx.h"
#include <future>
#include <chrono>
#include "Filter.h"
#include <iostream>
#include <future>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>

long long Ticks();
int WithOpencv(cv::Mat img);
int With_OUT_Opencv(cv::Mat img);
int TestThreads (char *buffer,std::string file);
#define Blur3x3(matrix,f,c) ((matrix[(f-1)*1600+(c-1)] + matrix[(f-1)*1600+c] + matrix[(f-1)*1600+(c+1)] + matrix[f*1600+(c-1)] + matrix[f*1600+c] + matrix[f*1600+(c+1)] + matrix[(f+1)*1600+(c-1)] + matrix[(f+1)*1600+c] + matrix[(f+1)*1600+(c+1)])/9)


int _tmain(int argc, _TCHAR* argv[])
{

    std::string file="Test.bmp";

    auto function = [&](char *buffer){return TestThreads(buffer,file);};
    char *buffers[12];
    std::future<int> frames[12];
    DWORD tid;
    int i,j;
    int nframes = 0;
    int ncores;

    cv::setNumThreads(8);

    for (i=0;i<8;i++) buffers[i] = new char[1000*1024*1024];
    for (j=1;j<9;j++)
    {
        ncores = j;
        long long t = Ticks();
        for (i=0;i<ncores;i++) frames[i] = std::async(std::launch::async,function,buffers[i]);
        for (i=0;i<ncores;i++) nframes += frames[i].get();
        t = Ticks() - t;

        std::cout << "Mean time using " << ncores << " cores is: " << t/nframes << "ms" << std::endl << std::endl;
        nframes = 0;
        Sleep(2000);
    }
    for (int i=0;i<8;i++) delete buffers[i];

    return NULL;

    return 0;
}


int TestThreads (char *buffer,std::string file)
{

    long long ta;
    int res ...
(more)
edit retag flag offensive close merge delete

Comments

Where is attached project?

LBerger gravatar imageLBerger ( 2015-08-04 15:19:52 -0600 )edit

the attached project is not shown... you can download from link text thanks!!

Dum gravatar imageDum ( 2015-08-04 15:51:54 -0600 )edit

I cannot compile your code your opencv configuration is not the same. So I read your code and I try to understand and may be I can make some mistake.

Your function is TestThreads and you try to launch it ncore times. This function read a file on disk. It takes times and It can block other thread ?

LBerger gravatar imageLBerger ( 2015-08-04 16:26:33 -0600 )edit

The main function executes the function TestThreads in 1 core, when finish, executes the function in two cores, after in three...and so on until 8 cores, because my cpu has 4 native cores (8 with hyperthreading). It is only to see how the execution time in each thread increases whe the number of threads is increased. You are right. This may be a possible cause because I/O operations may block threads, but if in the for loop inside the TestThreads fuctions the function With_OUT_Opencv (without calls to opencv functions) is called instead of WithOpencv, the behabiour is the expected one. Even leaving only 1 single function call to a opencv function (convert, or blur,...) the delay when increasing the number of threads appears...

Dum gravatar imageDum ( 2015-08-04 16:53:59 -0600 )edit

I'm not sure at all that there is no thread in blur, sqrt and normalize opencv function. For example in dilate there is some thread I think here. Hence I think you may have some bias error in your test.

LBerger gravatar imageLBerger ( 2015-08-05 01:19:11 -0600 )edit

Okay, this cannot be debugged if you do not

  • Add your code here, support doesnt dig into other websites to find out where your code might be
  • Give us your complete CMAKE configuration output --> because it is very important to know which packages are included in your built
  • Give us the complete configuration of your system you are running it on

Because your claim is in the case of my desktop right here just complete nonsense. If I run a parallell for loop over the 24 cores I got, then it simply speeds up the processing with a factor of x20...

StevenPuttemans gravatar imageStevenPuttemans ( 2015-08-05 02:56:13 -0600 )edit

LBerger, i think that you are right. Those functions use parallel_for_ in its implementations. If the number of threads is not set with cv::setnumThreads, opencv gets the logical cores of the system by default, so, in the execution of any of those functions 8 or 12 extra suprocesses (in a 4 native cores CPU and in a 6 native CPUs computers that I have used) are created. if you set cv::setnumThreads(0), those subprocesses are not created. I have made test changing this number of threads with cv::setnumThreads but with the same result.

StevenPuttemans, Thanks. I hope that you can now reproduce the behaviour. As you can see, the threads are created before, and each thread executes the function with the opencv calls.

Dum gravatar imageDum ( 2015-08-05 05:08:37 -0600 )edit

Hi Steven, ¿Does the code work for you?...

Dum gravatar imageDum ( 2015-08-05 08:29:16 -0600 )edit