Ask Your Question

Dense SIFT in VLFeat and OpenCV integration

asked 2017-01-09 17:21:13 -0600

lovaj gravatar image

updated 2017-01-10 19:56:57 -0600

I'm reading this paper where Dense SIFT is used, in particular (quoting the paper):

We extract SIFT [29] descriptors at 4 scales corresponding to region widths of 16, 24, 32 and 40 pix- els. The descriptors are extracted on a regular densely sam- pled grid with a stride of 2 pixels.

So far so good. However, now I'm trying to understand DSIFT from VLFeat and C API in order to reproduce the strategy above.

From my understanding from this question and this picture (taken from the link above):

enter image description here

Each SIFT descriptor is computed using 4x4 bins. Now, each bin can have different size. So supposing that we have the image img (read using OpenCV), we could do this:

Mat img = imread("img.jpg",CV_LOAD_IMAGE_GRAYSCALE);
// transform image in cv::Mat to float vector
std::vector<float> imgvec;
for (int i = 0; i < img.rows; ++i){
  for (int j = 0; j < img.cols; ++j){
    imgvec.push_back(<unsigned char>(i,j) / 255.0f);                                                                                                                                                                                                        

cv::Mat1f descriptors;
for(int i=4; i<10; i+=2){
  VlDsiftFilter *dsift = vl_dsift_new_basic (img.rows, img.cols, 2, i);
  vl_dsift_process (dsift,;
  cv::Mat1f scaleDescs(vl_dsift_get_keypoint_num(dsift), 128, vl_dsift_get_descriptors(dsift));

Now, I know that I could just "try" this, but understanding if I'm doing something wrong could be very complicate to find the error (not because of the language but because of the logic and the correct API usage). Besides, here we have also several operations from VLFeat to OpenCV.

What do you think about this solution?

Let's suppose I have a grey-scale image read with OpenCV:

cv::Mat img = cv::imread("img.jpg",cv::IMREAD_GRAYSCALE);

Now let's suppose that I want to use it for VLFeat SIFT or Dense SIFT. It's not clear how to convert cv::Mat into a float* to use as input in this library.

In this question this answer propose just to:

if(img.type() == CV_32F)
  float* matData = (float*);

In this other question:

Mat imgFloat; 
img.convertTo(imgFloat, CV_32F, 1.0/255.0);
float* matData = imgFloat.ptr<float>();

And in these slides:

Mat toFloat; 
float *vlimage = (float*);

Which one(s) is (are) correct(s)?

edit retag flag offensive close merge delete


like stated below, you really should use convertTo(), not a for loop. in your example,<unsigned char>(i,j) / 255.0f will be 0 for any value < 255 !

berak gravatar imageberak ( 2017-01-10 00:44:11 -0600 )edit

Second solution is best one because ptr method is used. 1.0/255 is a constant to normalize data between 0 and 1. It is better to check if mat isContinuous too...

LBerger gravatar imageLBerger ( 2017-01-10 15:03:12 -0600 )edit

Mmmh I understand, and why should be it important? :D

lovaj gravatar imagelovaj ( 2017-01-10 15:06:18 -0600 )edit

If your data are not continuous in memory I don't think results given SIFT will be good. You have to use this method if you don't want to use opencv method to pixel access

LBerger gravatar imageLBerger ( 2017-01-10 15:33:21 -0600 )edit

I was talking about the normalization ;)

lovaj gravatar imagelovaj ( 2017-01-10 16:39:53 -0600 )edit

As for normalization, you can normalize it in the convertTo method (that's the 1.0/255.0 in the code I posted). If the image values don't need to between 0 and 1, you can just leave that parameter off and it defaults to 1.0. Whether it does depends on what you're calling, which is VLFeat and we have no idea.

Tetragramm gravatar imageTetragramm ( 2017-01-10 20:00:36 -0600 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2017-01-09 18:28:01 -0600

Tetragramm gravatar image

Your opencv code is okay, but it would be faster and more effective to do

Mat img = imread("img.jpg", CV_LOAD_IMAGE_GRAYSCALE); Mat imgFloat; img.convertTo(imgFloat, CV_32F, 1.0/255.0);

Then you check that it's continuous by using imgFloat.isContinuous() (it should be) and then get a pointer to the data by doing imgFloat.ptr<float>(). If it's not continuous, you can just keep doing what you are. It's just faster this way and you don't have to worry about the loops yourself.

Lastly, remember that OpenCV is Row Major. I couldn't find if VLFeat is Row or Column major.

edit flag offensive delete link more


According to this, "If not otherwise specified, matrices in VLFeat are stored in memory in column major order". However, later it's written that "Images I(x,y)I(x,y) are stored instead in row-major order, i.e. one row after the other". Does it mean that I'm doing something wrong?

lovaj gravatar imagelovaj ( 2017-01-10 08:05:05 -0600 )edit

That's a question for VLFeat. I know nothing about how it works. Just based on that quote, I think you're ok. that is an "other specification" that images are row major.

Tetragramm gravatar imageTetragramm ( 2017-01-10 09:11:36 -0600 )edit

BTW, shouldn't I pass the pointer through ?

lovaj gravatar imagelovaj ( 2017-01-10 10:26:28 -0600 )edit
1 is the old way. It still works, but is not preferred. imgFloat.ptr<float>() is the typesafe, way of accessing the start of the image.

If you are using ROIs and submatrixes, the memory is non-continuous, and you access each row by using imgFloat.ptr<float>(rowNumber);

Tetragramm gravatar imageTetragramm ( 2017-01-10 19:58:51 -0600 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-01-09 17:21:13 -0600

Seen: 52 times

Last updated: Jan 10