# Image Retrieval through fisher vectors: why my implementation works SO BAD?

I'm trying to implement a Content Based Image Retrieval system for small image-dataset. By now, I'm using 1k images (40 categories) from Caltech101.

This is the system workflow:

- For each image
`img`

compute the SIFT descriptors (using`cv::SIFT`

, default paramaters) and save the descriptor matrix in`std::vector<cv::Mat> descriptors`

- Compute a Gaussian Mixture Model using VLfeat implementation VlGMM and the previously computed descriptors, using the k-means as base algorithm (again, using VLFeat implementation).
- For each
`img`

, compute the correspondent fisher vector using GMM obtained before, one for each dataset image. - Given the query
`q`

, compute SIFT descriptors and fisher vectors (using the same GMM of before). - Compute the Euclidean distance between
`q`

's fisher vector and each`img`

fisher vector from the dataset. - Return the top
`k`

images, according to the distances obtained from 5.

This is the code from point 2 to point 3 and 5, which are the most important ones:

```
vl_size totalElements = totalKeypoints * dimension;
float *data = new float[totalElements];
size_t counter = 0;
//save into data all descriptors matrices (tested, it works)
for(size_t i=0; i<descriptors.size(); i++){
std::memcpy(data+counter,descriptors[i].data,descriptors[i].total() * sizeof(float));
counter += descriptors[i].total();
}
VlKMeans * kmeans = vl_kmeans_new (VL_TYPE_FLOAT, VlDistanceL2) ;
vl_kmeans_set_algorithm (kmeans, VlKMeansElkan) ;
vl_kmeans_set_initialization(kmeans, VlKMeansPlusPlus);
vl_kmeans_set_min_energy_variation(kmeans,0.0001);
vl_kmeans_set_num_repetitions(kmeans,3);
VlGMM* gmm = vl_gmm_new(VL_TYPE_FLOAT, dimension, k) ; //k=256, dimension= 128
vl_gmm_set_initialization (gmm,VlGMMKMeans);
vl_gmm_set_kmeans_init_object(gmm, kmeans);
vl_gmm_cluster (gmm, data, totalKeypoints);
delete []data;
int encodingSize = 2 * dimension * k;
std::vector<cv::Mat> codes(n); //n is the dataset size
for(size_t i=0;i<descriptors.size();i++){
float *enc = (float*)vl_malloc(sizeof(float) * encodingSize);
vl_fisher_encode(enc, VL_TYPE_FLOAT,
vl_gmm_get_means(gmm), dimension, k,
vl_gmm_get_covariances(gmm),
vl_gmm_get_priors(gmm),
descriptors[i].data,1,
VL_FISHER_FLAG_IMPROVED
);
codes[i] = cv::Mat(1, encodingSize, CV_32FC1, enc);
}
...
//here we compute query's fisher code in the same way
...
for(int i=0;i<n;i++)
distances[i] = norm(queryCode,codes[i],cv::NORM_L2);//from OpenCV
```

I hope that this can be considered as MCV code. *If you want to see some other section, please let me know it*.

**The performance are simply HORRIBLE**: even if we use an image from the dataset itself, the most similar image is the image itself (but even in that case the distance is `0.0201524`

!), while **all the others images are totally uncorellated!**

I don't know if this is normal, but it take `263s`

for gmm clustering and `0.91s`

for creating 1k fisher vectors.

I'm a bit frustrated, I don't know at all how could I improve this. Fisher vector is already an advanced solution, **why does it work SO bad?**

Trying also the cosine distances with the following code:

```
static double cosdistance(const cv::Mat &testFeature, const cv::Mat &trainFeature)
{
double a = trainFeature.dot(testFeature);
double b = trainFeature.dot(trainFeature);
double c = testFeature.dot(testFeature);
return -a / sqrt(b*c);
}
...
for(int i=0;i<n;i++)
distances[i] = cosdistance(queryCode,codes[i]);
```

Didn't give better results ...