I'm trying to use desne descriptors for SURF and SIFT descriptors to improve my VLAD code precision. I'm testing my approach with the Oxford building dataset.
It has been shown that dense descriptor can improve this kind of applications.
This is my code for extracting dense SURF descriptors:
void DSURFOpenCV::ComputeDescriptors(const cv::Mat &img, cv::Mat1f &descriptors){
descriptors.release();
int startSize = step < 8 ? 8 : step;
std::vector<cv::KeyPoint> kps;
for(int z=startSize; z<=startSize*5;z=z+startSize)
for (int i=step; i<img.rows-step; i+=step)
for (int j=step; j<img.cols-step; j+=step)
kps.push_back(cv::KeyPoint(float(j), float(i), float(z)));
surf->compute(img,kps,descriptors);
}
Where step
is the number of pixels between different keypoints (at least 8pxs) and I extract keypoints at 5 different scales, starting from the initial value of step
(i.e. startSize
) to 5 times its original value. With step=8
we have the same values of the same values of the linked paper above, except I use 5 scales (Section 5, "Implementation details"):
We extract SIFT [29] descriptors at 4 scales corresponding to region widths of 16, 24, 32 and 40 pix- els. The descriptors are extracted on a regular densely sam- pled grid with a stride of 2 pixels.
I do something similar with dense SIFT by VLFeat:
void DSIFTVLFeat::ComputeDescriptors(const cv::Mat &img, cv::Mat1f &descriptors){
descriptors.release();
// transform image in cv::Mat to float vector
cv::Mat imgFloat;
img.convertTo(imgFloat, CV_32F, 1.0/255.0);
if(!imgFloat.isContinuous())
throw std::runtime_error("imgFloat is not continous");
for(int i=binSize; i<=maxBinSize; i+=2){
VlDsiftFilter *dsift = vl_dsift_new_basic (img.rows, img.cols, step, i);
vl_dsift_process (dsift, imgFloat.ptr<float>());
cv::Mat scaleDescs(vl_dsift_get_keypoint_num(dsift), 128, CV_32F, (void*) vl_dsift_get_descriptors(dsift));
descriptors.push_back(scaleDescs);
vl_dsift_delete (dsift);
}
}
However, all these methods decreased the Mean Average Precision for Oxford (performance metric for this dataset). Why this happens and how I could improve it? Any C++ implementation would be useful.