1 | initial version |
Okay lets wrap some things up. The biggest misunderstanding is that the traincascade algorithm is actually paralizable. Large parts of the boosting process are simply single core sequential code. Only the grabbing of all the features and calculating the corresponding weak classifiers can be done multithreaded.
This is where the new concurrency API comes into play, it actually selects the backend on your system, either TBB or OpenMP but it should produce the same performance results.
If your training takes long it can be one of the following reasons
-precalcValBufSize
and -precalcIdxBufSize
. With 8 GB of memory I would certainly put them on 2048MB each!