Dense SIFT in VLFeat and OpenCV integration
I'm reading this paper where Dense SIFT is used, in particular (quoting the paper):
We extract SIFT [29] descriptors at 4 scales corresponding to region widths of 16, 24, 32 and 40 pix- els. The descriptors are extracted on a regular densely sam- pled grid with a stride of 2 pixels.
So far so good. However, now I'm trying to understand DSIFT from VLFeat and C API in order to reproduce the strategy above.
From my understanding from this question and this picture (taken from the link above):
Each SIFT descriptor is computed using 4x4 bins. Now, each bin can have different size. So supposing that we have the image img
(read using OpenCV), we could do this:
Mat img = imread("img.jpg",CV_LOAD_IMAGE_GRAYSCALE);
// transform image in cv::Mat to float vector
std::vector<float> imgvec;
for (int i = 0; i < img.rows; ++i){
for (int j = 0; j < img.cols; ++j){
imgvec.push_back(img.at<unsigned char>(i,j) / 255.0f);
}
}
cv::Mat1f descriptors;
for(int i=4; i<10; i+=2){
VlDsiftFilter *dsift = vl_dsift_new_basic (img.rows, img.cols, 2, i);
vl_dsift_process (dsift, imgvec.data());
cv::Mat1f scaleDescs(vl_dsift_get_keypoint_num(dsift), 128, vl_dsift_get_descriptors(dsift));
descriptors.push_back(scaleDescs);
free(dsift);
}
Now, I know that I could just "try" this, but understanding if I'm doing something wrong could be very complicate to find the error (not because of the language but because of the logic and the correct API usage). Besides, here we have also several operations from VLFeat to OpenCV.
What do you think about this solution?
Let's suppose I have a grey-scale image read with OpenCV:
cv::Mat img = cv::imread("img.jpg",cv::IMREAD_GRAYSCALE);
Now let's suppose that I want to use it for VLFeat SIFT or Dense SIFT. It's not clear how to convert cv::Mat
into a float*
to use as input in this library.
In this question this answer propose just to:
if(img.type() == CV_32F)
float* matData = (float*)img.data;
In this other question:
Mat imgFloat;
img.convertTo(imgFloat, CV_32F, 1.0/255.0);
float* matData = imgFloat.ptr<float>();
And in these slides:
Mat toFloat;
img.convertTo(toFloat,CV_32F);
float *vlimage = (float*) tofloat.data;
Which one(s) is (are) correct(s)?
like stated below, you really should use convertTo(), not a for loop. in your example,
img.at<unsigned char>(i,j) / 255.0f
will be 0 for any value < 255 !Second solution is best one because ptr method is used. 1.0/255 is a constant to normalize data between 0 and 1. It is better to check if mat isContinuous too...
Mmmh I understand, and why should be it important? :D
If your data are not continuous in memory I don't think results given SIFT will be good. You have to use this method if you don't want to use opencv method to pixel access
@Tetragrammit's a duplicate question
I was talking about the normalization ;)
As for normalization, you can normalize it in the convertTo method (that's the 1.0/255.0 in the code I posted). If the image values don't need to between 0 and 1, you can just leave that parameter off and it defaults to 1.0. Whether it does depends on what you're calling, which is VLFeat and we have no idea.