opencv_traincascade is parallelized using TBB but not completly. There is a pretty large linear part, notebly in updateTrainingSet()
.
When training the 20th stage and looking for 10,000 negative samples (for example) you have to search through 19^2*10,000 samples which takes a lot of time.
Does any one know if there is any intension to parallelize this part? (If not - I might do that myself).