Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Dense SIFT in VLFeat and OpenCV integration

I'm reading this paper where Dense SIFT is used, in particular (quoting the paper):

We extract SIFT [29] descriptors at 4 scales corresponding to region widths of 16, 24, 32 and 40 pix- els. The descriptors are extracted on a regular densely sam- pled grid with a stride of 2 pixels.

So far so good. However, now I'm trying to understand DSIFT from VLFeat and C API in order to reproduce the strategy above.

From my understanding from this question and this picture (taken from the link above):

enter image description here

Each SIFT descriptor is computed using 4x4 bins. Now, each bin can have different size. So supposing that we have the image img (read using OpenCV), we could do this:

Mat img = imread("img.jpg",CV_LOAD_IMAGE_GRAYSCALE);
// transform image in cv::Mat to float vector
std::vector<float> imgvec;
for (int i = 0; i < img.rows; ++i){
  for (int j = 0; j < img.cols; ++j){
    imgvec.push_back(img.at<unsigned char>(i,j) / 255.0f);                                                                                                                                                                                                        
  }
}

cv::Mat1f descriptors;
for(int i=4; i<10; i+=2){
  VlDsiftFilter *dsift = vl_dsift_new_basic (img.rows, img.cols, 2, i);
  vl_dsift_process (dsift, imgvec.data());
  cv::Mat1f scaleDescs(vl_dsift_get_keypoint_num(dsift), 128, vl_dsift_get_descriptors(dsift));
  descriptors.push_back(scaleDescs);
}

Now, I know that I could just "try" this, but understanding if I'm doing something wrong could be very complicate to find the error (not because of the language but because of the logic and the correct API usage). Besides, here we have also several operations from VLFeat to OpenCV.

What do you think about this solution?

Dense SIFT in VLFeat and OpenCV integration

I'm reading this this paper where Dense SIFT is used, in particular (quoting the paper):

We extract SIFT [29] descriptors at 4 scales corresponding to region widths of 16, 24, 32 and 40 pix- els. The descriptors are extracted on a regular densely sam- pled grid with a stride of 2 pixels.

So far so good. However, now I'm trying to understand DSIFT from VLFeat and C API in order to reproduce the strategy above.

From my understanding from this question and this picture (taken from the link above):

enter image description here

Each SIFT descriptor is computed using 4x4 bins. Now, each bin can have different size. So supposing that we have the image img (read using OpenCV), we could do this:

Mat img = imread("img.jpg",CV_LOAD_IMAGE_GRAYSCALE);
// transform image in cv::Mat to float vector
std::vector<float> imgvec;
for (int i = 0; i < img.rows; ++i){
  for (int j = 0; j < img.cols; ++j){
    imgvec.push_back(img.at<unsigned char>(i,j) / 255.0f);                                                                                                                                                                                                        
  }
}

cv::Mat1f descriptors;
for(int i=4; i<10; i+=2){
  VlDsiftFilter *dsift = vl_dsift_new_basic (img.rows, img.cols, 2, i);
  vl_dsift_process (dsift, imgvec.data());
  cv::Mat1f scaleDescs(vl_dsift_get_keypoint_num(dsift), 128, vl_dsift_get_descriptors(dsift));
  descriptors.push_back(scaleDescs);
  free(dsift);
}

Now, I know that I could just "try" this, but understanding if I'm doing something wrong could be very complicate to find the error (not because of the language but because of the logic and the correct API usage). Besides, here we have also several operations from VLFeat to OpenCV.

What do you think about this solution?

Dense SIFT in VLFeat and OpenCV integration

I'm reading this paper where Dense SIFT is used, in particular (quoting the paper):

We extract SIFT [29] descriptors at 4 scales corresponding to region widths of 16, 24, 32 and 40 pix- els. The descriptors are extracted on a regular densely sam- pled grid with a stride of 2 pixels.

So far so good. However, now I'm trying to understand DSIFT from VLFeat and C API in order to reproduce the strategy above.

From my understanding from this question and this picture (taken from the link above):

enter image description hereenter image description here

Each SIFT descriptor is computed using 4x4 bins. Now, each bin can have different size. So supposing that we have the image img (read using OpenCV), we could do this:

Mat img = imread("img.jpg",CV_LOAD_IMAGE_GRAYSCALE);
// transform image in cv::Mat to float vector
std::vector<float> imgvec;
for (int i = 0; i < img.rows; ++i){
  for (int j = 0; j < img.cols; ++j){
    imgvec.push_back(img.at<unsigned char>(i,j) / 255.0f);                                                                                                                                                                                                        
  }
}

cv::Mat1f descriptors;
for(int i=4; i<10; i+=2){
  VlDsiftFilter *dsift = vl_dsift_new_basic (img.rows, img.cols, 2, i);
  vl_dsift_process (dsift, imgvec.data());
  cv::Mat1f scaleDescs(vl_dsift_get_keypoint_num(dsift), 128, vl_dsift_get_descriptors(dsift));
  descriptors.push_back(scaleDescs);
  free(dsift);
}

Now, I know that I could just "try" this, but understanding if I'm doing something wrong could be very complicate to find the error (not because of the language but because of the logic and the correct API usage). Besides, here we have also several operations from VLFeat to OpenCV.

What do you think about this solution?

Let's suppose I have a grey-scale image read with OpenCV:

cv::Mat img = cv::imread("img.jpg",cv::IMREAD_GRAYSCALE);

Now let's suppose that I want to use it for VLFeat SIFT or Dense SIFT. It's not clear how to convert cv::Mat into a float* to use as input in this library.

In this question this answer propose just to:

if(img.type() == CV_32F)
  float* matData = (float*)img.data;

In this other question:

Mat imgFloat; 
img.convertTo(imgFloat, CV_32F, 1.0/255.0);
float* matData = imgFloat.ptr<float>();

And in these slides:

Mat toFloat; 
img.convertTo(toFloat,CV_32F);
float *vlimage = (float*) tofloat.data;

Which one(s) is (are) correct(s)?