I'm reading this paper where Dense SIFT is used, in particular (quoting the paper):
We extract SIFT [29] descriptors at 4 scales corresponding to region widths of 16, 24, 32 and 40 pix- els. The descriptors are extracted on a regular densely sam- pled grid with a stride of 2 pixels.
So far so good. However, now I'm trying to understand DSIFT from VLFeat and C API in order to reproduce the strategy above.
From my understanding from this question and this picture (taken from the link above):
Each SIFT descriptor is computed using 4x4 bins. Now, each bin can have different size. So supposing that we have the image img
(read using OpenCV), we could do this:
Mat img = imread("img.jpg",CV_LOAD_IMAGE_GRAYSCALE);
// transform image in cv::Mat to float vector
std::vector<float> imgvec;
for (int i = 0; i < img.rows; ++i){
for (int j = 0; j < img.cols; ++j){
imgvec.push_back(img.at<unsigned char>(i,j) / 255.0f);
}
}
cv::Mat1f descriptors;
for(int i=4; i<10; i+=2){
VlDsiftFilter *dsift = vl_dsift_new_basic (img.rows, img.cols, 2, i);
vl_dsift_process (dsift, imgvec.data());
cv::Mat1f scaleDescs(vl_dsift_get_keypoint_num(dsift), 128, vl_dsift_get_descriptors(dsift));
descriptors.push_back(scaleDescs);
}
Now, I know that I could just "try" this, but understanding if I'm doing something wrong could be very complicate to find the error (not because of the language but because of the logic and the correct API usage). Besides, here we have also several operations from VLFeat to OpenCV.
What do you think about this solution?