# Which object recognition algorithm should I use?

I am pretty new to CV, so forgive my stupid questions...

What I want to do: I want to recognize a RC plane in live video (for now its only a recorded video).

What I have done so far:

• Differences between frames
• Convert it to grey scale
• GaussianBlur
• Threshold
• findContours

Here are some example Frames: C:\fakepath\frame1.jpg C:\fakepath\frame2.jpg C:\fakepath\frame3.jpg C:\fakepath\frame4.jpg

But there are also frames with noise or other objects (birds,..), so there are more objects in the frame.

I thought I could do something like this: Use some object recognition algorithm for every contour that has been found. And compute only the feature vector for each of these bounding rectangles. Is it possible to compute SURF/SIFT/... only for a specific patch (smaller part) of the image? Since it will be important that the algorithm is capable of processing real time video I think it will only be possible if I don't look at the whole image all the time?! Or maybe decide for example if there are more than 10 bounding rectangles I check the whole image instead of every rectangle. Then I will look at the next frame and try to match my feature vector with the previous frame. That way I will be able to trace my objects. Once these objects cross the red line in the middle of the picture it will trigger another event. But that's not important here.

I need to make sure that not every object which is crossing or behind that red line is triggering that event. So there need to be at least 2 or 3 consecutive frames which contain that object and if it crosses then and only then the event should be triggered.

There are so many variations of object recognition algorithms, I am bit overwhelmed. Sift/Surf/Orb/... you get what I am saying.

Can anyone give me a hint which one I should chose or if what I am doing is even making sense?

edit retag close merge delete

Sort by ยป oldest newest most voted

I am not quite sure what a RC plane is, and which objects you want to recognize. Anyways, some hints:

• It is no problem to give just a smaller region of interest (ROI) to any detector/descriptor method in which you want to find correspondences.
• SIFT and SURF are both having a patent, furthermore SIFT is one of the slowest keypoint based methods.
• I recommend you to use a binary descriptor for a fast matching (you can use Hamming-distance then) and for a (typical) low computational cost, e.g. ORB/ORB or BRISK/FREAK, then you also don't need to compute a ROI since these methods are typically fast enough to process the whole image.

However note that all these keypoint based methods work only for images or ROIs which have some structure, so plane objects with no edges won't be detectable since no keypoints will be generated there, but I guess birds are no problem.

more

Those are RC Planes: http://www.hobbytron.com/RCAirplanes.html But in my case its a bit more "professional" :D! Thanks for the answere! If SURF is patented how come it is implemented in an open source library? I am not planning on commercial use of the software.

( 2014-01-02 20:49:56 -0500 )edit
1

Well, with patents its always somewhat complicated and I just wanted to make sure that you are aware of it. Typically it is no problem to use them for research purposes, thus SIFT and SURF are part of OpenCV, however settled in the non-free module.

Btw.: Also have a look at the cascade-classifier of OpenCV, this could also work for your problem, they are typically used for pedestrian-detection. It depends if you have a tracking or a recognition problem (which could of course be combined as well) and the type of objects you have (multiple objects of one class versus one object, etc.), whether you go for a keypoint based method (e.g. to track a certain object) or for a recognition algorithm which includes a classifier (and its a priori training).

( 2014-01-03 07:09:40 -0500 )edit

Thanks for your answer. After a short break I back again at this problem. I think that a classifier does not suit my problem. I have video footage so you will be able to see what I poorly tried to describe: https://www.dropbox.com/s/gmjlqcnwq3tezos/sample.mp4 Since the plane will be seen from many different angles it will be a huge task to train a classifier. What do you think now after you have seen the video? Thanks again for your help!

( 2014-01-07 15:29:12 -0500 )edit
1

After watching the video and understanding now your task. You basically have two problems here. 1. after figuring out the moving object you want to classify it whether it is an RC plane or not, 2. after that you want to track it. Your question considered mainly task 1. This looks to me like a standard classification problem which you can solve by training a classifier with suitable feature vectors. If you have time limitations and want a fast decision I'd go for HOG as features and either a linear SVM or a boosting method as classifier. If you want to analyze the video later, so time available, you can also try other feature descriptors, e.g. ensemble of local features (bag of (visual) words), or tryout combinations of shape and color features.

( 2014-01-08 05:01:30 -0500 )edit

Thanks so much! So, do you suggest that I take the HOG features from frame to frame or to train the classifier. I think the latter will be rather difficult as the size and direction of the plane constantly change. Then I still have my tracking problem. I want to build something like a trace. And the trace needs to be a certain length before I want another event to be triggered (not CV related). Thanks again!

( 2014-01-09 18:20:44 -0500 )edit
1

Well, you could train a cascade-classifier and then predict every frame if you find sth. Alternatively, as soon as you detected a moving object, you compute features from just the object and classify it. Of course you need to train the classifier beforehand with several positives (take many variations) and negatives (this you'd have to do with the cascade-classifier framework of OpenCV as well).

( 2014-01-10 03:48:11 -0500 )edit

Hey man! A little heads up: https://www.youtube.com/watch?v=8Z2Ba4p83h8 I am not done yet. But thats the direction I am going. I am a little disappointed about the performance. This will never be usable for real time applications :(. What I am trying to do here: 1) Extract features for every bounding rectangle using ORB (keypoints & descriptor). 2) Try to match every descriptor with descriptors from previous frame 3) Once there have been more than 2 consecutive matches found I am using a Cascade Classifier (Haar features) for object classification (Plane Yes/No?)

So, what do you think. Does that make sense?

( 2014-01-20 18:49:55 -0500 )edit

Hm step 3 could be replaced by a single classifier call, i.e. training a complete cascade is not necessary. A cascade should be trained if you want to reject many features in a fast way, but since you have already found the object in charge already beforehand this is imho not necessary. Thus, you can either: classify using the contour then (see e.g. Shotton et al.:" "Multiscale categorical object recognition using contour fragments.", or compute features from the object and match them. For example you could further use the ORB descriptors and encode them in a bag of words(bow) manner and decide upon the bow-descriptor if it is the object or not (or compute a HOG descriptor, or any other descriptor, many ways lead to Rome ;) ).

( 2014-01-22 02:44:42 -0500 )edit

Hey! Me again... So I tried a lot in the past week and found out that the keypoints which are extracted from these small patches can not reliable be matched. I would get 6 keypoints from one image and 5 keypoints from the same section only one frame later. So there is a minor change. But I would only get 2 or 3 good matches. On the other hand comparing two completely different patches might gave me 6/6 good matches. So I think I have to discard this approach. I will now try a BoW, shape detection or template matching approach. What do you think?

( 2014-02-01 03:03:59 -0500 )edit

BoW is only useful if you have some more features than just 6, maybe you could densely sample features to get some more. Template matching is easy to apply, however difficult to make rotation and scale invariant. Shape matching could be worth trying.

( 2014-02-01 07:23:21 -0500 )edit

Official site

GitHub

Wiki

Documentation