Revision history [back]

Can we use Bag of Visual Words to compute similarity between images directly?

I'm implementing a Content Based Image Retrieval application (CBIR).

I've read about the Bag of Features model and it's considered an intermediate-step algorithm in some application. For example, the histograms generated could be used for SVM classification.

Since the produced vectors are affected by the curse of dimensionality, it can become expensive to compute some kind if distance between a query histogram and all the dataset histograms. For this reason, techniques like LSH have been implemented for datasets with hundreds of thousands (or millions) of images.

However, my system is based on (let's say) 50k images, so compute directly the distance should not be so prohibitive.

I have two questions:

Is this approach reasonable? Recap of it: classic BoF approach and then compute the distance between each dataset histogram and the query histogram. The smaller one is returned as the most similar image.
Which distance should I use? I've heard that chi-squared distance is a popular choice. What about euclidean, cosine similarity etc? Which one is the best in such a context?