Ask Your Question
0

Just linking to opencv ruins thread-concurrency!?

asked 2019-11-02 06:06:47 -0600

bwvb gravatar image

updated 2019-12-09 07:58:26 -0600

Akhil Patel gravatar image

in a small program demonstrating C++ threads, I noticed that linking with openCV ruins the concurrency of the threads. NOTE: the demo program doesn't even use openCV!. I stumbled on this problem when I observed that my own application, which does use openCV, did not show the concurrency as expected. It turned out that linking with openCV was the problem.

In the sample code (file main.cpp given below) three threads are launched, each doing the same calculation. From the same source, I create two executables, called 'mttok' and 'mttno', as follows:

g++ -o mttok -O3 main.cpp -lpthread
g++ -o mttno -O3 main.cpp -lpthread -L/usr/local/opencv/lib64 -lopencv_core

When I run the first executable with the gnu 'time' command I get:

=> time ./mttok
All threads are running...
result1: 5e+19
result2: 5e+19
result3: 5e+19
27.005u 0.003s 0:09.00 300.0%   0+0k 0+0io 0pf+0w

The third field is the elapsed time (9 secs) whereas the fourth number (300%) is the cpu-time, clearly showing the three threads running concurrently. This is also seen from an applet on my desktop visualising the CPU-activity: three bars corresponding to 3 'CPUs' climb to ~100%.

The other executable (linked with opencv) gives

=> time ./mttno
All threads are running...
result1: 5e+19
result2: 5e+19
result3: 5e+19
26.690u 0.203s 0:26.45 101.6%   0+0k 43648+0io 8pf+0w

Note the ~3x larger elapsed time (now 26 secs) and the CPU-percentage (101%). The CPU activity shows only one bar climbing to 100%.

I have tried this both with and without TBB, and both with and without openmp. The results are the same. The source code is a recent clone of the git repository (4.1.2-dev) but I saw the same phenomenon with the precompiled version of openCV that comes with SUSE leap 15.1, i.e. opencv 3.3 Who can explain this behaviour and suggest what can be done to keep proper concurrent behaviour?

I have asked this question at link:stackoverflow, with more detail and the source code. But this did not lead to a solution. Perhaps somebody of hte openCV community can help?

Here follows the code:
=== main.cpp ===

#include <iostream>
#include <thread>   

const unsigned long NMAX=10000000000;

class MTTest
{
public:
   void foo( double& r )
   {
      double s = 0;
      for (unsigned long u=0; u<NMAX; u++)
      {
         s += u;
      }
      r = s;
   }
};

int main()
{
   double s1, s2, s3;

   std::unique_ptr<MTTest> ptr1( new MTTest );
   std::unique_ptr<MTTest> ptr2( new MTTest );
   std::unique_ptr<MTTest> ptr3( new MTTest );

   std::thread t1( &MTTest::foo, ptr1.get(), std::ref(s1) );
   std::thread t2( &MTTest::foo, ptr2.get(), std::ref(s2) );
   std::thread t3( &MTTest::foo, ptr3.get(), std::ref(s3) );

   std::cout << "All threads are running..." << std::endl;

   // synchronize threads:
   t1.join();
   t2.join();
   t3.join();

   std::cout << "result1: " << s1 << std::endl;
   std::cout << "result2: " << s2 << std::endl;
   std::cout << "result3: " << s3 << std ...
(more)
edit retag flag offensive close merge delete

Comments

I have no idea, but if linking produces funny symptoms, I'd try changing linking order, that is, putting lpthread last in the command

mvuori gravatar imagemvuori ( 2019-11-02 08:48:51 -0600 )edit

@mvuori Good suggestion. I just put the pthread library at the end. It does not change the story.

bwvb gravatar imagebwvb ( 2019-11-02 10:08:35 -0600 )edit

unfair comparison, since you include cache warmups, opencl precompilation and such things in your measurement

berak gravatar imageberak ( 2019-11-03 05:19:36 -0600 )edit

@berak, Why unfair? The program does not use opencv, so I would expect the impact of linking against opencv should not have any impact at all. But it does.

Perhaps somebody would be so kind to repeat my steps above to see if the same effect is there. It takes less than 5 minutes....

bwvb gravatar imagebwvb ( 2019-11-03 14:52:20 -0600 )edit

there is (a lot of !) opencv code running on startup of your program, even if you don't call any code explicitly

berak gravatar imageberak ( 2019-11-04 04:22:34 -0600 )edit

@berak. Well, fair enough. But how does that prevent the concurrency of the 3 threads that are started in the demo program? This is not about losing some performance when openCV is linked in, but about losing thread concurrency when openCV is used. This is a major penalty, and I find it hard to belief that this is normal behaviour!

Regards Bw

bwvb gravatar imagebwvb ( 2019-11-04 12:03:01 -0600 )edit

prevent the concurrency of the 3 threads

can you explain ?

berak gravatar imageberak ( 2019-11-04 12:17:06 -0600 )edit

Have a look at the beginning of my posting. The code launches three different threads. Without linking with opencv the program runs 3x faster (elapsed time) compared to the case where opencv has been linked in. Moreover, CPU usage also points to the same: 300% (no opencv) vs 100% (with opencv). Finally, the same visual feedback is given by a CPU monitor: without opencv, there are three bars rising simulataneously to 100% usage corresponding to three threads running on three CPU's. With opencv linked in, only one CPU-meter is rising (and it takes three times the time to complete).

bwvb gravatar imagebwvb ( 2019-11-07 09:51:37 -0600 )edit

2 answers

Sort by ยป oldest newest most voted
0

answered 2019-12-29 13:55:02 -0600

bwvb gravatar image

Thanks to the answer of 'robin' above, it became clear to me that the true culprit was openblas. Further googling rapidly gave the answer, which was even explicitly stated in the faq: https://github.com/xianyi/OpenBLAS/wiki/faq: If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following.

export OPENBLAS_NUM_THREADS=1 in the environment variables. Or
Call openblas_set_num_threads(1) in the application on runtime. Or
Build OpenBLAS single thread version, e.g. make USE_THREAD=0 USE_LOCKING=1 (see comment below)

Indeed, it solved my problem. I', stunned that an external library can have this impact.

edit flag offensive delete link more
0

answered 2019-12-04 13:42:34 -0600

robin gravatar image

updated 2019-12-05 12:04:02 -0600

I also undirectly found same problem and what can be the beginning of an explanation. Indeed anytime I run the openpose program or use openpose library in a program of my own only a unique CPU is used (use system monitor like tool to see this) !!

So I was suspecting that the problem come from CPU affinity and I can confirm that openpose code does not force CPU affinity so it may come from a dependency and opencv is one of them.

I also saw that I could not combine some libraries using multiple threads with opencv. Same as in this thread if we just link with opencv, performance of the system does not reach real time requirements of those libraries internal threads while without opencv (but using exactly the same code) everything works like a charm.

This is explained by an affinity forced to a unique CPU for any process linked with opencv.

As a note the behavior is the same with opencv 3.4.0 and 3.4.1, either on ubuntu 16.04 and 18.04, and either using clang or gcc.

I have no precise idea why this is happening and wondering if it may come from opencv or from one of its dependencies...

EDIT: the problem came from the openblas library used by opencv. The solution was to rebuild openblas with NO_AFFINITY=1 CMake option (this finally deactivate parts of the code that do calls to sched_setaffinity. After rebuild everything works as expected...

edit flag offensive delete link more

Comments

Thanks for responding! As a matter of fact, I had given up on openCV, for the problems mentioned. I read your post only today.

II have rebuilt openblas according to your suggestion (I noticed that NO_AFFINITY=1 is the default, though), Also, I modified the openblas section for CMAKE, using cmake-gui, accordingly. But alas, to no avail. However, using ldd on libopencv_core.so revealed a dependency on "libopenblas_pthreads0.so", but I have no idea how to build this library. I does not come with openblas!

Further googling on libopenblas_pthreads0 does give a few hits, related to threading problems! All on opensuse linux distributions. Indeed, that is also the platform I work on.

Can I ask you which platform you are working on?

Thanks, Bertwim

bwvb gravatar imagebwvb ( 2019-12-29 09:37:52 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2019-11-02 06:06:47 -0600

Seen: 546 times

Last updated: Dec 29 '19