opencv_traincascade is parallelized using TBB but not completly. There is a pretty large linear part, notebly in updateTrainingSet(). When training the 20th stage and looking for 10,000 negative samples (for example) you have to search through 19^2*10,000 samples which takes a lot of time.