Haar features are inherently slow - they make extensive use of floating point operations, which are a bit slow on mobile devices.
A quick solution would be to turn to LBP cascades - all you need is a few lines changed in your code. The performance gain is significant, and the loss in accuracy is minimal. Look for lbpcascades/lbpcascade_frontalface.xml
.
If you want to dig deeper into optimzations, here is a generic optimization tip list (cross-posted from SO) Please note that face detection, being one of the most requested features of OpenCV, is already quite optimized, so advancing it further may mean deep knowledge.
Advice for optimization
A. Profile your app. Do it first on your computer, since it is much easier. Use visual studio profiler, and see what functions take the most. Optimize them. Never ever optimize because you think is slow, but because you measure it. Start with the slowest function, optimize it as much as possible, then take the second slower.
B. First, focus on algorithms. A faster algorithm can improve performance with orders of magnitude (100x). A C++ trick will give you maybe 2x performance boost.
Classical techniques:
Resize you video frames to be smaller. many times, you can extract the information from a 200x300px image, instead of a 1024x768. The area of the first one is 10 times smaller.
Use simpler operations instead of complicated ones. Use integers instead of floats. And never use double
in a matrix or a for
loop that executes thousands of times.
Do as little calculation as possible. Can you track an object only in a specific area of the image, instead of processing it all for all the frames? Can you make a rough/approximate detection on a very small image and then refine it on a ROI in the full frame?
C. In for loops, it may make sense to use C style instead of C++. A pointer to data matrix or a float array is much faster than mat.at<i, j=""> or std::vector<>. But change only if it's needed. Usually, a lot of processing (90%) is done in some double for loop. Focus on it. It doesn't make sense to replace vector<> all over the place, ad make your code look like spaghetti.
D. Some OpenCV functions convert data to double, process it, then convert back to the input format. Beware of them, they kill performance on mobile devices. Examples: warping, scaling, type conversions. Also, color space conversions are known to be lazy. Prefer grayscale obtained directly from native YUV.
E. ARM processors have NEON. Learn and use it. It is powerfull!
A small example:
float* a, *b, *c;
// init a and b to 1000001 elements
for(int i=0;i<1000001;i++)
c[i] = a[i]*b[i];
can be rewritten as follows. It's more verbose, but trust me it's faster.
float* a, *b, *c;
// init a and b to 1000001 elements
float32x4_t _a, _b, _c;
int i;
for(i=0 ...
(more)