Question about Bag of Words, detectors, and such

answered 2014-03-01 08:14:06 -0600

Guanta
6736 ●6 ●25 ●79

Here some answers to the rumors you 'heard':

Yes, you need to be careful when using SIFT and SURF commercially.
About detectors, descriptors and matchers:

There exist keypoint detectors which detect locations in the image containing high entropy or with a certain criteria (like corners, blobs). Often they also encode scale and rotation information which make then a feature invariant to those transformation (if it can make use of it at all). A list of possible keypoint-detectors can be found at http://docs.opencv.org/modules/features2d/doc/common_interfaces_of_feature_detectors.html#featuredetector-create
Then there exist feature (or descriptor) extractors which typically use the keypoint locations to build a descriptor. There exist two categories of features, binary ones (FREAK, BRISK, ORB, BRIEF) and non-binary ones (SIFT, SURF, MSER). Binary features are typically faster to compute and offer a more compact representation. They are often used in robotics for a fast match. In the case of image retrieval the most common choice is SIFT. A list of supported Features can be found here: http://docs.opencv.org/modules/features2d/doc/common_interfaces_of_descriptor_extractors.html#descriptorextractor-create Typically you can combine any possible keypoint detector and feature extractor of OpenCV (apart from the SIFT detector). However, not each combination makes sense or works as expected. Natural combinations or combinations suggested by the authors of the descriptors are afaik SIFT-SIFT, ORB-ORB, SURF-SURF, BRISK-BRISK, BRISK-FREAK, MSER-MSER, FAST-BRIEF.
For the matchers you have the choice if you want to use the brute-force matcher or the flann based matcher. The latter one is preferable if you have a very large database of features. It creates an index of all features which is then queried. The brute-force matcher in contrast just goes over all features in question and picks the best match. It depends on the dataset and the task you have what you should use (typically BFMatcher is fine). Furthermore you need to take care that if you use a binary feature you should compare them with the Hamming-norm not with the L2 norm.

It's not about keypoints rather descriptors. You can find more information about bag of words here: http://answers.opencv.org/question/8677/image-comparison-with-a-database/#8686
Classifiers are used to distinguish between classes, e.g. dogs and cats. The information about a class has to be learned, i.e. a classifier needs to be trained for that.

Good luck with your project!

edit flag offensive delete link

Comments

Thank you for this, this cleared up a lot of uncertainty.

So it seems that most of the feature descriptors have a keypoint detector as well.

I have seen that thread in the past I guess since you are the poster, I can ask a question about what I was confused with in that post, when I mentioned classifier it had to do with #4, but seems the first method is the preferred.

I figured that if you trained certain objects it would also help with detecting them easier(as in my case most objects will be symmetrical shapes that are common). I also figure if I wanted to wrap a border around it to show a detection I would need a them as well.

Now it seems you recommend the GIST-Descriptor? I will look into it, but I'm curious your take on it, so it just gets the ...(more)

KonradZuse ( 2014-03-02 00:12:35 -0600 )edit

I ran out of space, but I also wanted to know about the BoW vocab size. It seemed like 2000 vocabulary items seemed to be where it started to drop in accuracy(from some paper I saw, I will try ot find it) so I'm curious about how bad it's affected as it grows.

If we have a histogram compare we shoudl be able to at least give a few options, and I would assume that even if we have 100,000 items we still would get the "correct" answer, we might just have a bunch of others that also fit within the spectrum, right?

I'm not sure how many vocab items I will have, but eventually it could be 100,000, I might be able to categorize and such, but my total will definitely be a huge number.

KonradZuse ( 2014-03-02 00:36:35 -0600 )edit

BoW is used to form a global image descriptor from local ones, or in other words: local descriptors are encoded to a global one. Since GIST is already a global one you don't need this.

There are different opinions on the vocabulary size, some papers state that with growing size also the accuracy increases, in others it drops at some point.

I don't know what you exactly meant with 'spectrum'. In an image retrieval case, you are comparing the BoW-Descriptor from one image with all others and take the best one. If it's a recognition task you classify the BoW-Descriptor and get the class of the image.

Guanta ( 2014-03-02 07:13:07 -0600 )edit

The last sentence appears that you haven't understood the concept totally (or I misinterpret it). You don't build a vocabulary for each individual image, instead you need to build one single vocabulary from a set of training images (i.e. clustering let's say 100k local features from several images) which is used afterwards to encode the local descriptors for each individual test image --> frequency histogram of nearest cluster-center = BoW descriptor.

Guanta ( 2014-03-02 07:17:59 -0600 )edit

Huh, a global descriptor, crazy... So it's a giant descriptor that contains all of the information about it's parts(the individual descriptions of each)? Is this what the "Cluster" function would do?

It seems GIST is what I would be looking at then, you seem to recommend it as well.

Glad to hear that there are varying opinions on size.

I meant spectrum with histograms. I thought that when we get the descriptor we find the images with the closest resemblance, then do a histogram compare and see which ones are the closest resemblance(since there could be more than one that it closely resem

I am going to need to recognize items, but then find out what it is. Like I mentioned a bottle, I could have multiple companies, coke, sprite, fanta, but I need to identify that bottle.

KonradZuse ( 2014-03-02 20:06:27 -0600 )edit

Sorry, the last sentence was something I just made up in case there was an issue with a giant vocab.

I understand that we use multiple images to create this vocabulary. I also want to make sure we can constantly update this vocabulary? I guess it shouldn't matter when we create a new cluster with new features since the old ones will still be there.

I also find it interesting there aren't m any matchers, just a few brute force methods I saw and the FLANN. I'm assuming I will still be using a matcher with the GIST-Descriptor?

Thanks again for the help.

EDIT: It's a bit difficult to find GIST-Desciptors. I found a link here to a C++ implementation http://www.cs.cornell.edu

http://stackoverflow.com/questions/10051221/lear-gist-descriptor-c-code-used-with-c

KonradZuse ( 2014-03-02 20:11:04 -0600 )edit

GIST is a global image descriptor, I don't know how high dimensional it is.

No, this is not what the cluster-function would do, it just clusters the training-features and the means are used to encode the test-features.

Updating the vocabulary is typically not that easy! You can't just add new clusters. For k-means the number of clusters have to be given in advance. But you don't need to do this. Only if your training-features are very different to your test-features.

Well, you need some form of matcher to get the nearest cluster centers to encode your feature vectors.

Guanta ( 2014-03-06 05:10:51 -0600 )edit

@Guanta, I'd search a query image in a database of unique images. For those unique images, I'd construct a BoW vocabulary with N number of clusters where N 1)either would be as large as the final number of images I will have in database, 2) or be change dynamically and correspond to the number of images in my database. Does it make sense to create a codebook from BoW visual words and use that to match query images with only one unique image in database? Also, if you can, please answer my question here: http://answers.opencv.org/question/44670/bow-number-of-words-in-vocabulary-for-unique/

bad_keypoints ( 2014-10-17 09:26:27 -0600 )edit

1) this is not neccesary, see my answer to your other question. 2) every change of clusters would correspond in a retraining -> costs time -> typically not needed if training set is already diverse enough.

Yes you can train another codebook to use it for matching, however typically you wouldn't use k-means but just a kd-tree (see e.g. the good flann library). However, if the number of classes grows you'd need to retrain that as well (the same would happen with other real classifiers).

Guanta ( 2014-10-17 10:41:35 -0600 )edit

add a comment

Question about Bag of Words, detectors, and such

1 answer

Comments

Links

Question Tools

Stats

Related questions

Question about Bag of Words, detectors, and such edit

1 answer

Comments

Links

Question Tools

Stats

Related questions

Question about Bag of Words, detectors, and such