Dense descriptor at multiple scales
I've implemented this version of dense SURF for OpenCV3:
void DSURFOpenCV::ComputeDescriptors(cv::Mat &img, cv::Mat1f &descriptors){
std::vector<cv::KeyPoint> kps;
int startSize = step < 8 ? 8 : step; //step=8 by default
for (int i=step; i<img.rows-step; i+=step)
{
for (int j=step; j<img.cols-step; j+=step)
{
for(int z=startSize; z<=startSize*5;z=z+startSize){
kps.push_back(cv::KeyPoint(float(i), float(j), float(z)));
}
}
}
surf->compute(img,kps,descriptors);
}
As you can see, I create descriptors every 8 pxs with keypoint sizes of 8, 16, 24, 32 and 40 descriptors. I took inspiration for these values from this paper. This approach combined with VLFeat VLAD encoding largely improved Mean Average Precision for Holiday dataset, but it was decreased for the Oxford dataset.
I really wonder why this happens, but my only explanation is because the Oxford dataset is more challenging on the view-point changes, which is know to be the weak point for dense decriptors.
However, it could be possible that my code above isn't so efficient and it could be improved.
For example, in VLFeat DSIFT a gaussian kernel similarly to classic descriptors.
Another point of interest is the keypoint size: in the example above all the keypoints have size greater or equal than the step (as suggested in the linked paper), but here the keypoint scale is the square root of the step. Why this difference?
Finally, any other consideration/suggestion/tip to improve MAP (and not decrease it) for the oxford dataset too?