Ask Your Question

Enable Multithreading with TBB during cascade training

asked 2012-06-30 11:44:43 -0600

rantanplan gravatar image

updated 2012-07-04 08:57:31 -0600

Kirill Kornyakov gravatar image

I have compiled OpenCV-2.4.1 with TBB support (-DWITH_TBB=ON) on Ubuntu, which currently ships with TBB 4.0. However when I run opencv_traincascade to train a classifier only one core is used. I've also explicitly checked whether libtbb2 (really is 4.0) is used (/proc/[opencv_traincascade process number]/maps), and it is used. I am using an AMD Athlon 64 X2 Dual Core 4200+ processor.

Is there a way to enforce the usage of 2 or more cores? Or is there a problem with TBB on AMD CPUs?

Update: As I can't comment directly on your answers, so i update my question.

  1. Thanks for your answers!
  2. @Daniil: I tried checking for HAVE_TBB in traincascade.cpp without success. However, checking my build i found cvconfig.h which actually defines HAVE_TBB. So i believe that the libraries are really build with TBB support. I also checked their symbol table with nm and tbb symbols are there.
  3. @Maria: Thats what i was afraid of. However this discussion states that there should be an improvement using TBB.?!
edit retag flag offensive close merge delete



Rantanplan, I'm an author of c++ cascade classifier in OpenCV including traincascade application. Improvement posted in the discussion can be explained by switching to use LBP features instead of Haar. LBPs are binary features in contrast with Haar and don't use float arithmetic. So they are about 3 times faster in detection. LBP-cascade is trained ~hour in comparison with ~day for Haar-cascade on the same data. traincascade was not tbb-optimized purposefully. It uses 1 optimization from MLL, + 1 minor optimization of feature precomputing. But it can give only slight improvement. Try to use LBP!

Maria Dimashova gravatar imageMaria Dimashova ( 2012-07-04 08:55:01 -0600 )edit

@Maria: Thanks for the clarification.

rantanplan gravatar imagerantanplan ( 2012-07-04 14:47:42 -0600 )edit

2 answers

Sort by ยป oldest newest most voted

answered 2012-07-04 06:33:30 -0600

Maria Dimashova gravatar image

Even with TBB you'll not see a sufficient workload of CPU cores by OpenCV traincascade application. Almost all the time only one core'll work. It's because only small part of the training code is parallelized by TBB: finding the best split of tree node and precomputing some part of feature values before the training a new stage. But significant time the traincascade is looking for negative samples that was recognized as positive (face) samples by all trained stages (trained part of a cascade) to train next new stage. This pick of samples is not parallelized.

edit flag offensive delete link more


I am also in trouble with this :(((. Have you found the solution yet? Please tell me

Robin Hood gravatar imageRobin Hood ( 2014-03-13 22:02:43 -0600 )edit

This is annoying as hell. Is there an OpenCV bug report/ticket that I can follow in order to get updates on this?

Silex777 gravatar imageSilex777 ( 2014-03-18 10:30:29 -0600 )edit

Is there no way to run traincascade on gpu yet?

muglikar gravatar imagemuglikar ( 2014-09-07 08:22:22 -0600 )edit

i dont think GPU is needed, negative sampling parallelization, at least with duplicates, would solve main problem.

Loknar gravatar imageLoknar ( 2015-12-12 13:14:20 -0600 )edit

Hello @MariaDimashova , is it right that during negative-stillPos-sample-search the bottleneck is HDD bandwidth while during actual training process the bottleneck is CPU and memory bandwidth? Atm I'm training haar (mode ALL) with about 45k positive samples and the training part is very slow even on the first stages. Looks like the problem is CPU load for me, while only about 30% of the CPU is used. Are there any plans to multi-thread the feature-selection part?

Micka gravatar imageMicka ( 2016-11-28 03:17:28 -0600 )edit

answered 2012-07-02 06:07:44 -0600

Daniil Osokin gravatar image

updated 2012-07-03 04:27:44 -0600

Hi, Rantanplan!

  • TBB is independent from processor, it can run on AMD (Intel FAQ).
  • I saw a problem with 2.0 TBB version (solution). Try to update TBB version to actual, if this isn't important.
  • To be completeness, are you set -DWITH_TBB=ON when run cmake? This turn on TBB support.
  • Also you can write in traincascade.cpp the section:
#ifdef HAVE_TBB
printf("TBB is used\n");

If printf executed, then TBB is ok.

edit flag offensive delete link more

Question Tools

1 follower


Asked: 2012-06-30 11:44:43 -0600

Seen: 11,844 times

Last updated: Jul 04 '12