Minimal number of keypoint-descriptors

asked 2015-01-27 13:18:46 -0500

Doombot gravatar image

updated 2015-01-29 10:11:06 -0500

I have been debating with some people on the topic of the minimal number of keypoint necessary to get a reliable detection, in this case with the BRISK keypoint-descriptors of OpenCV 3.

With my understanding of the whole binary keypoint descriptors theory, the probability of detecting a single individual keypoint reliably in a model and a scene image is quite low considering the possible change in illumination, slight perspective effects, etc. It is when you are able to match a "sizable" amount of keypoint-descriptors pairs that you are able to assess the likelihood of the presence of the model in the scene, since you may use classification algorithms such as RANSAC (included in the "findHomography" function of OpenCV) to classify the matches as inliers or outliers, then see if the keypoints of the model "fit" the keypoints of the scene. This is essentially what Lowe said in 2004 (Section 7: http://www.cs.ubc.ca/~lowe/papers/ijc...)

Now, my colleagues (who are not specialists of binary keypoint-descriptors but hey, neither do I...) and I are debating on how much keypoints should be used from the model image in order to do the matching. Their opinion is that if you are able to select approximately 10 (or "n", "n" being a relatively low amount of) keypoint-descriptors from a model image, you might improve a lot the detection rate since the global number of individual match to do is lower, hence probably reducing the number of mismatches. This, of course, imply a "supervised" selection of the keypoints on the model. I, on the other hand, think that it is simply not possible to know beforehand which keypoints detected in the model image will be present in the scene image, so it would be wiser to use a higher number of them, then rely on the RANSAC classification to sort them out. This would allow to either do a "supervised" selection of the keypoint or to simply use all the detected keypoints on the model image, even if some "poor" keypoints are kept. Finally, in all cases, the execution time is considered to be fast enough for the application.

So, I would like to know if our thoughts are relevant. We may be both wrong, too. If possible, I would appreciate answers on the academic side, in the sense that we must demonstrate the "why" of our methods. At any rate, we will end up testing both ways of tackling the problem, but any theoretic insights will be appreciated.

edit retag flag offensive close merge delete