Revision history [back]

Okay lets wrap some things up. The biggest misunderstanding is that the traincascade algorithm is actually paralizable. Large parts of the boosting process are simply single core sequential code. Only the grabbing of all the features and calculating the corresponding weak classifiers can be done multithreaded.

This is where the new concurrency API comes into play, it actually selects the backend on your system, either TBB or OpenMP but it should produce the same performance results.

If your training takes long it can be one of the following reasons

Your training data is complex, finding weak classifiers that can seperate your data with respect to your performance accuracy designed is difficult.
Your training data is to big, larger resolutions need more memory, so if you do not increase the calculation buffers, then it will slow down the training process significantly. Try upgrading the -precalcValBufSize and -precalcIdxBufSize. With 8 GB of memory I would certainly put them on 2048MB each!
You specified your training parameters to strict. Can you give your complete traincascade command?