Ask Your Question

Svm using Surf Features

asked 2013-08-20 03:23:54 -0500

FLY gravatar image

updated 2013-11-04 10:15:40 -0500

This is the code i did for train svm with surf features , it didn't show me the any syntax error but i think there is something logical wrong in it

string YourImagesDirectory="D:\\Cars\\";
vector<string> files=listFilesInDirectory(YourImagesDirectory+"*.jpg");
//Load NOT cars!
string YourImagesDirectory_2="D:\\not_cars\\";
vector<string> files_no=listFilesInDirectory(YourImagesDirectory_2+"*.jpg");
// Initialize constant values
int nb_cars = files.size();
const int not_cars = files_no.size();
const int num_img = nb_cars + not_cars; // Get the number of images

const int image_area = 30*40;

// Initialize your training set.
Mat training_mat(num_img,image_area,CV_32FC1);
Mat labels(num_img,1,CV_32FC1);

// Set temp matrices
Mat tmp_img;
Mat tmp_dst( 30, 40, CV_8UC1 ); // to the right size for resize
// Load image and add them to the training set
std::vector<string> all_names;

all_names.insert(all_names.end(), files_no.begin(), files_no.end());
// Load image and add them to the training set
int count = 0;
vector<string>::const_iterator i;
string Dir;
for (i = all_names.begin(); i != all_names.end(); ++i)
    Dir=( (count < files.size() ) ? YourImagesDirectory : YourImagesDirectory_2);
    tmp_img = imread( Dir +*i, 0 );
    resize( tmp_img, tmp_dst, tmp_dst.size() );
    Mat row_img = tmp_dst; // get a one line image.
    detector.detect( row_img, keypoints);
    drawKeypoints( row_img, keypoints, img_keypoints_1, Scalar::all(-1), DrawMatchesFlags::DEFAULT );
    extractor.compute( row_img, keypoints, descriptors_1);
    row_img.convertTo( training_mat.row(count), CV_32FC1 );< float >(count, 0) = (count<nb_cars)?1:-1; // 1 for car, -1 otherwise*/

and When i am going to predict the image for result , it give me the runtime error in it and didn't give me the prediction result

image description


Also tried this loop in the code above , this is working fine , but as i am new to opencv and beginner i don't know whether my approach in this loop is right or not , because my next step is to recognize the object from video

   int dictionarySize = 1500;        
   int retries = 1;
   int flags = KMEANS_PP_CENTERS;
   BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);
   BOWImgDescriptorExtractor bowDE(extractor, matcher);
    for (i = all_names.begin(); i != all_names.end(); ++i)
      Dir=( (count < files.size() ) ? YourImagesDirectory : YourImagesDirectory_2);

      tmp_img = cv::imread( Dir +*i, 0 );

      resize( tmp_img, tmp_dst, tmp_dst.size() );

      Mat row_img = tmp_dst;

      detector.detect( row_img, keypoints);

      extractor.compute( row_img, keypoints, descriptors_1);

      bowTrainer.add(descriptors_1);< float >(count, 0) = (count<nb_cars)?1:-1; // 1 for car, -1 otherwise

edit retag flag offensive close merge delete


Do you have any idea on how can I do the same as what you did but in Python ?

tes gravatar imagetes ( 2016-04-02 18:52:26 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2013-08-20 18:59:47 -0500

updated 2013-08-21 17:38:04 -0500

Number of keypoints on images depends on the image itself. Therefore, you don't always get the same results, and your vectors are of different sizes. That's why detect is crashing. Each descriptors is of the same size, but the number of detected keypoint could change a lot! Look at the Bag of Words approach (for example) to get a size-fixed vector per image. You could find a lot of resource on this website for BOW.

By the way, why resize your image to a single line? Look at how SURF is working (or SIFT quite similar) to understand why a single line image isn't a good idea.


Lets summarize! If you look closely at the first link you provide, they use a trick to recognize image. They don't compute keypoints, but use the full image directly as a characteristic vector. It "simulates" the keypoints extraction. All vectors are the same size, therefore you could train a SVM.

If you want to use SURF (or any keypoints detector) your vector are the descriptors that you extract. But how to recognize an object? The easiest way is to compute keypoints on the object. Extract the descriptors of these keypoints, and make an assumption: the frequency of keypoints is discriminant for my object. Therefore, you create some delegates, usually with KMeans (Bag of Words version), lets call them your vocabulary. After, for each image, you compute the occurrence of each delegate from the keypoints of your image: ie, match each keypoints to the closest delegate. See BOWImgDescriptorExtractor that perform extraction of descriptors from your keypoints and the matching to the vocabulary (possibly extracted from BOWTrainer). After that, you have a fixed-length characteristic vector: the number of delegate (normalized) for each object.

I provide a simple pseudo code here.

// Create the object for the vocabulary.
BOWKMeansTrainer bow( voca_size )
// This loop extract keypoints and descriptors from image.
// It stores descriptors in the BOW object.
// I assume each image only has one object
// Otherwise use ROI to restrict to the object
for_each( img : img_list )
    keypoints = ComputeSURFKeypoints( img )
    descriptors = ExtractSURFDescriptors( img, keypoints )
    bow.add( descriptors );
vocabulary = bow.cluster(); // Create the vocabulary with KMeans.
// Now, you have a vocabulary, compute the occurrence of delegate in object.
BOWImgDescriptorExtractor dextract( SURFExtractor, FlannMatching )
// Set the vocabulary
dextract.setVocabulary( vocabulary )
for_each( img : img_list )
    keypoints = ComputeSURFKeypoints( img )
    dextract.compute( img, keypoints, descriptors )
    trainData.push_back( descriptors ) // trainData is a vector-like for SVM.
    labels.push_back( object_idx ) // Store the label of the current object.
// Train your SVM with trainData

You should read the SURF paper in deep! Read at least the paper on BOW, and read other papers on objects recognition (and be sure you understand it), especially if you don't want to use BOW approach.

edit flag offensive delete link more


Actually on single line, I am expecting surf to find no features. There is no cornerness at all in such an image, except for some line crossings.

StevenPuttemans gravatar imageStevenPuttemans ( 2013-08-21 00:42:30 -0500 )edit

@Mathieu Barnachon i resize my image to one line because i need to train using svm , i follow some steps from here : , and did it affect my speed using BOW ? because i use it in real time app ?

FLY gravatar imageFLY ( 2013-08-21 06:05:11 -0500 )edit

@StevenPuttemans If i convert the predicting images also to the one line than is it give the better result ?

FLY gravatar imageFLY ( 2013-08-21 06:08:33 -0500 )edit

The concept of SURF is not familiar to you it seems. Read the paper first. SURF looks for corners, by looking at combined gradients in x and y direction. If you resize your image to a row vector, you lose all the information of one of those two dimensions. The resulting vector of features will be a row vector, to train your svm, but you cannot make a row vector of your actual image.

StevenPuttemans gravatar imageStevenPuttemans ( 2013-08-21 06:14:03 -0500 )edit

@StevenPuttemans It looks i just need to change this line in the loop ? Mat row_img = tmp_dst.reshape( 1, 1 ); // get a one line image. to Mat row_img = tmp_dst

FLY gravatar imageFLY ( 2013-08-21 06:55:23 -0500 )edit

Like we suggested indeed!

StevenPuttemans gravatar imageStevenPuttemans ( 2013-08-21 07:02:56 -0500 )edit

@StevenPuttemans But this may get to logical error here :

extractor.compute( row_img, keypoints, descriptors_1);

row_img.convertTo( training_mat.row(count), CV_32FC1 );

Because Mat result is saved in descriptor_1 , and keypoints , but we deal with row_img in training_mat

FLY gravatar imageFLY ( 2013-08-21 08:21:09 -0500 )edit

@Mathieu Barnachon I use the approach of BOW with surf , here, , but every one told me that its a bad approach to detect object from video , Every BOW detect on whole image takes several seconds. Let assume i have 40x30 image 20 frames/sec. I scan in 4 scales. with step 4 pixel. Time estimation for processing 1 sec video: 403020*4/16=6000 seconds if classifier performance is 1 detect per second , so i am going to test the svm with simple surf , so that it not take too much time to detect from video , my second try of loop code is working fine , but i don't know whether it is logically and practically right or wrong

FLY gravatar imageFLY ( 2013-08-21 12:48:07 -0500 )edit

@Mathieu Barnachon Let me summarize your psuecode implementation in the new thread [here ] (

FLY gravatar imageFLY ( 2013-08-22 17:03:43 -0500 )edit

@Mathieu Barnachon ,can I please use the pseudo code you wrote in my next blog post about BOW? I will give credit, of course.

GilLevi gravatar imageGilLevi ( 2013-08-22 17:59:41 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2013-08-20 03:23:54 -0500

Seen: 5,497 times

Last updated: Nov 04 '13