Ask Your Question

C++ - Using Bag of Words for matching pictures together?

asked 2014-07-09 05:15:58 -0500

lilouch gravatar image

updated 2014-07-09 05:18:12 -0500

berak gravatar image

I would like to compare a picture (with his descriptors) with thousand of pictures inside a database in order to do a matching. (if two pictures are the same,that is to say the same thing but it can bo rotated, a bit blured, has a different scale etc.).

For example, in this case there is a matching ( i've used SIFT with a robust matcher :

image description

I saw on the internet that compute descriptors for each picture and compare them one to one is very a long process. I did some researches and i saw that i can do an algorithm based on Bag of Words.

I don't know exactly how is works yet, but it seems to be good. But in think, i can be mistaked, it is only to detect what kind of object is it not ?

I would like to know according to you if using it can be a good solution to compare a picture to a thousands of pictures using descriptors like Sift of Surf ?

If yes, do you have some suggestions about how i can do that ?


edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2014-07-09 06:57:16 -0500

The difference between the keypoint approach and for example the BOW approach is the reduction of the descriptor itself. At keypoint approaches all found keypoints must be matched with each keypoint found in the second image to look for occurences. BOW is an approach that generates a short notebook description of a database item and then tries to find those descriptions (a very short explanation of BOW here ... I know) in the new image. This mean less validations need to happen and thus it should work faster. However BOW takes time to train, is not that rotation invariant. Take a look at this post.

edit flag offensive delete link more


Thank you for your answer ! i've already read this document. So you advise to me to implement BoW ? It's a good approach ?

lilouch gravatar imagelilouch ( 2014-07-09 07:12:08 -0500 )edit

It depends on what you want, for example walking in a store with a facecam and recognizing stuff, BoW will be a good approach since your objects will somewhat always keep the same orientation. SURF is indeed robust to rotation, but it comes at a prices of comparing all those keypoints indeed.

StevenPuttemans gravatar imageStevenPuttemans ( 2014-07-09 07:15:58 -0500 )edit

okey but i don't understand something. you told me that Bow doesn't work well with orientated objects but however in this algorithm local feature with descriptors are extracted. So in the picture above, Let's assume the eye is considered as a feature extrated by the BoW, if it's rotated or not, it's changing anything no ? So if's with the rotation it doesn't work well, are there others algorithms that can be work well on my case ? Because i would like something light to compute (As you told me,make a comparison of keypoints with thousands of images will be very heavy and long process...)

lilouch gravatar imagelilouch ( 2014-07-09 07:31:16 -0500 )edit

i would not worry so much about the time spent on comparing descriptors, as the keypoint extraction will consume the most time.

maybe look at other, faster keypoint/descriptor pairs, like FAST,ORB,BRIEF,MSER (which again might not work with BOW nicely, because they're binary).

you will definitely need some experiments here

imho, the best use case for BOW is ml-classification, where you need a fixed number of descriptors for each image to feed into e.g. an svm or ann

berak gravatar imageberak ( 2014-07-09 10:24:13 -0500 )edit

Thank for your answer. The problem is i can't use SIFT or SURF because they are patented. So i used ORB with BoW but yeah it's not good as SIFT or SURF. I get a bit confusing because some people told me that BoW is for classificatino ( i agree) but other told me that i can use for matching but i don't know how because i have more than 2000 differents pictures so it assumes that i have to have 2000 class ?

lilouch gravatar imagelilouch ( 2014-08-08 02:29:47 -0500 )edit

@lilouch, so what did you do for your app? Did you use BoW or did you use FLANN index?

bad_keypoints gravatar imagebad_keypoints ( 2014-10-17 06:33:19 -0500 )edit

I used BoW with FLANN

lilouch gravatar imagelilouch ( 2014-12-02 09:29:55 -0500 )edit

I finally used BoW with FLANN

lilouch gravatar imagelilouch ( 2014-12-02 09:30:54 -0500 )edit

Steven can you guide me in my application for Building and shops recognition . Here's a link to my question

Abu Gaseem gravatar imageAbu Gaseem ( 2015-03-17 11:22:51 -0500 )edit

Question Tools


Asked: 2014-07-09 05:15:58 -0500

Seen: 2,677 times

Last updated: Jul 09 '14