Object detection slow

asked 2012-07-25 04:41:41 -0600

Gábor Bernát
86 ●1 ●4 ●6

updated 2020-11-28 06:45:21 -0600

6772 ●3 ●48 ●79 https://github.com/stu...

Any ideas why the object detection may be slow? I mean I am trying out the face detection on a 800X600 image and it takes like 1426 ms, which is really slow in my book. Is this right?

My test environment is an ASUS K70IC with the following traits. I have built the latest svn trunk version with TBB, CUDA and eigen, (CMAKE File) using the detectMultiScale function with the haarcascades\haarcascade_frontalface_alt.xml file. I've also tried 2.4.2 as downloaded from the home site, however the numbers turn out the same.

I'm currently running the performance tests for the objdetect module, I'll post them as soon as they are ready however, in the meantime if you have any idea?

PS. And the results are in HTML or TXT. However in the test I see a peak of just over 100 by little for the 640X480 images...

edit retag flag offensive close merge delete

add a comment

answered 2012-07-25 07:25:58 -0600

sammy
3029 ●14 ●29 ●48

updated 2012-07-26 02:21:15 -0600

Kirill Kornyakov

2792 ●13 ●25 ●52

Haar features are inherently slow - they make extensive use of floating point operations, which are a bit slow on mobile devices.

A quick solution would be to turn to LBP cascades - all you need is a few lines changed in your code. The performance gain is significant, and the loss in accuracy is minimal. Look for lbpcascades/lbpcascade_frontalface.xml.

If you want to dig deeper into optimzations, here is a generic optimization tip list (cross-posted from SO) Please note that face detection, being one of the most requested features of OpenCV, is already quite optimized, so advancing it further may mean deep knowledge.

Advice for optimization

A. Profile your app. Do it first on your computer, since it is much easier. Use visual studio profiler, and see what functions take the most. Optimize them. Never ever optimize because you think is slow, but because you measure it. Start with the slowest function, optimize it as much as possible, then take the second slower.

B. First, focus on algorithms. A faster algorithm can improve performance with orders of magnitude (100x). A C++ trick will give you maybe 2x performance boost.

Classical techniques:

Resize you video frames to be smaller. many times, you can extract the information from a 200x300px image, instead of a 1024x768. The area of the first one is 10 times smaller.
Use simpler operations instead of complicated ones. Use integers instead of floats. And never use double in a matrix or a for loop that executes thousands of times.
Do as little calculation as possible. Can you track an object only in a specific area of the image, instead of processing it all for all the frames? Can you make a rough/approximate detection on a very small image and then refine it on a ROI in the full frame?

C. In for loops, it may make sense to use C style instead of C++. A pointer to data matrix or a float array is much faster than mat.at<i, j=""> or std::vector<>. But change only if it's needed. Usually, a lot of processing (90%) is done in some double for loop. Focus on it. It doesn't make sense to replace vector<> all over the place, ad make your code look like spaghetti.

D. Some OpenCV functions convert data to double, process it, then convert back to the input format. Beware of them, they kill performance on mobile devices. Examples: warping, scaling, type conversions. Also, color space conversions are known to be lazy. Prefer grayscale obtained directly from native YUV.

E. ARM processors have NEON. Learn and use it. It is powerfull!

A small example:

float* a, *b, *c;
// init a and b to 1000001 elements
for(int i=0;i<1000001;i++)
    c[i] = a[i]*b[i];

can be rewritten as follows. It's more verbose, but trust me it's faster.

float* a, *b, *c;
// init a and b to 1000001 elements
float32x4_t _a, _b, _c;
int i;
for(i=0 ...

(more)

edit flag offensive delete link

Comments

Wow, really good and exhaustive post. The only thing I'm missing why the test images work faster than my current input, for almost the same size.

Gábor Bernát ( 2012-07-25 08:40:13 -0600 )edit

I am not an expert in object detection, but there may be different causes: different settings can widely affect speed, and aslo different images: if a picture contains many of what the classifier considers to be "almost faces" it will be slower to process than an image where most of the evaluated positions can be easily discarded.

sammy ( 2012-07-25 08:49:08 -0600 )edit

And also, check the input size. Processing time is proportional to area, not width or height. And a 10% increase in both width and height may mean a lot in terms of area.

sammy ( 2012-07-25 08:50:09 -0600 )edit

add a comment

Object detection slow

1 answer

Comments

Links

Question Tools

Stats

Related questions

Object detection slow edit

1 answer

Comments

Links

Question Tools

Stats

Related questions

Object detection slow