Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Bag of Features: why the distance between two histograms of the same image is different than 0?

I'm trying to implement a Content Based Image Retrieval application for small image datasets. I'm testing it just with 1 thousands images from Caltech1001.

The approach that I'm using is the classic Bag of Features model using OpenCV using k-means and cv::SIFT detector/descriptor and HISTCMP_CHISQR_ALT as distance metric. Just for testing purpose, I tried to use as a query one image from the dataset itself, so the distance should be 0. This is the relative code:

cv::Mat vocabulary; //fill vocabulary through k-Means
cv::Mat histograms;
cv::Mat imgHistogram;
cv::Ptr<cv::DescriptorMatcher> matcher(new cv::BFMatcher);
std::unique_ptr<cv::BOWImgDescriptorExtractor> bowDE = 
     std::make_unique<cv::BOWImgDescriptorExtractor> (featureDetectorDescriptor,matcher);
std::vector<cv::KeyPoint> keyPoints;
//Compute histograms for all dataset images 
bowDE->setVocabulary(vocabulary);
for (each img in the dataset){
    featureDetectorDescriptor->detect(img, keyPoints);
    bowDE->compute(img,keyPoints,word);
    histograms.push_back(word);
}
//Compute the word of one image img FROM THE DATASET ITSELF!
featureDetectorDescriptor->detect(img, keyPoints);
bowDE->compute(img,keyPoints,imgHistogram);
double mindist = DBL_MAX;
for(int i=0;i<1000;i++){
    double dist = compareHist(imgHistogram, histograms.row(i), cv::HISTCMP_CHISQR_ALT );
    if(dist<minDist)
        minDist = dist;
}

Ok, since the query image img is contained in the database itself, the most similar returned is img itself, but minDist is different from 0! For example, for one image the distance from herself is 0.0119228.

And these are the relative histograms:

img in histogram:

[0.0059701493, 0.0089552235, 0.052238807, 0.0014925373, 0.0029850747, 0.016417911, 0.0014925373, 0.0014925373, 0.0014925373, 0.0014925373, 0.011940299, 0.0074626869, 0.0014925373, 0.0044776117, 0.0029850747, 0.0044776117, 0.011940299, 0.014925374, 0.12985075, 0.0044776117, 0.0014925373, 0.0074626869, 0, 0.0029850747, 0.0014925373, 0.013432836, 0.0074626869, 0.011940299, 0.0029850747, 0.0014925373, 0.061194029, 0.013432836, 0.0059701493, 0.0029850747, 0, 0.0014925373, 0.0074626869, 0.0014925373, 0.0059701493, 0.0029850747, 0.0029850747, 0.0029850747, 0.0059701493, 0, 0.025373135, 0.017910447, 0.0044776117, 0.0044776117, 0.0044776117, 0.0014925373, 0.0014925373, 0.0074626869, 0.010447761, 0.014925374, 0.0044776117, 0.0029850747, 0.0059701493, 0.055223882, 0.0029850747, 0, 0.0089552235, 0.0029850747, 0.0074626869, 0.0044776117, 0.0074626869, 0.0089552235, 0.0059701493, 0.0014925373, 0.020895522, 0, 0, 0.014925374, 0.019402985, 0.0029850747, 0.0044776117, 0, 0.0089552235, 0.0029850747, 0.0029850747, 0.0029850747, 0, 0.0014925373, 0.016417911, 0, 0.010447761, 0.017910447, 0.013432836, 0.0089552235, 0.0014925373, 0.011940299, 0.019402985, 0.010447761, 0.0059701493, 0.059701495, 0.031343285, 0.016417911, 0.011940299, 0.0014925373, 0.010447761, 0.0044776117]

img as query:

[0.0061349692, 0.0092024542, 0.053680979, 0.0015337423, 0.0030674846, 0.016871165, 0.0015337423, 0.0015337423, 0.0015337423, 0.0015337423, 0.010736196, 0.0076687112, 0.0015337423, 0.0046012271, 0.0030674846, 0.0046012271, 0.012269938, 0.016871165, 0.12423313, 0.0046012271, 0.0015337423, 0.0076687112, 0, 0.0030674846, 0.0015337423, 0.01380368, 0.0076687112, 0.012269938, 0.0030674846, 0, 0.06134969, 0.01380368, 0.0061349692, 0.0030674846, 0, 0.0015337423, 0.0076687112, 0, 0.0061349692, 0.0030674846, 0.0015337423, 0.0030674846, 0.0061349692, 0, 0.02607362, 0.010736196, 0.0046012271, 0.0046012271, 0.0046012271, 0.0015337423, 0.0015337423, 0.0076687112, 0.010736196, 0.015337422, 0.0046012271, 0.0015337423, 0.0061349692, 0.053680979, 0.0030674846, 0, 0.0092024542, 0.0030674846, 0.0076687112, 0.0046012271, 0.0076687112, 0.0092024542, 0.0076687112, 0.0015337423, 0.021472393, 0, 0, 0.015337422, 0.023006134, 0.0030674846, 0.0030674846, 0, 0.010736196, 0.0030674846, 0.0030674846, 0.0030674846, 0, 0.0015337423, 0.016871165, 0, 0.010736196, 0.018404908, 0.012269938, 0.0092024542, 0.0015337423, 0.012269938, 0.021472393, 0.010736196, 0.0061349692, 0.059815951, 0.030674845, 0.015337422, 0.012269938, 0.0015337423, 0.010736196, 0.0046012271]

This means that the first histogram computed for img and then saved in histogram is different from imgHistogram!

How is this possible? The two histograms should be the same!

Notice that this is true also for other distances like HISTCMP_CHISQR or HISTCMP_HELLINGER.