Ask Your Question

Revision history [back]

Number of keypoints on images depends on the image itself. Therefore, you don't always get the same results, and your vectors are of different sizes. That's why detect is crashing. Each descriptors is of the same size, but the number of detected keypoint could change a lot! Look at the Bag of Words approach (for example) to get a size-fixed vector per image. You could find a lot of resource on this website for BOW.

By the way, why resize your image to a single line? Look at how SURF is working (or SIFT quite similar) to understand why a single line image isn't a good idea.

click to hide/show revision 2
Add extra information.

Number of keypoints on images depends on the image itself. Therefore, you don't always get the same results, and your vectors are of different sizes. That's why detect is crashing. Each descriptors is of the same size, but the number of detected keypoint could change a lot! Look at the Bag of Words approach (for example) to get a size-fixed vector per image. You could find a lot of resource on this website for BOW.

By the way, why resize your image to a single line? Look at how SURF is working (or SIFT quite similar) to understand why a single line image isn't a good idea.

EDIT

Lets summarize! If you look closely at the first link you provide, they use a trick to recognize image. They don't compute keypoints, but use the full image directly as a characteristic vector. It "simulates" the keypoints extraction. All vectors are the same size, therefore you could train a SVM.

If you want to use SURF (or any keypoints detector) your vector are the descriptors that you extract. But how to recognize an object? The easiest way is to compute keypoints on the object. Extract the descriptors of these keypoints, and make an assumption: the frequency of keypoints is discriminant for my object. Therefore, you create some delegates, usually with KMeans (Bag of Words version), lets call them your vocabulary. After, for each image, you compute the occurrence of each delegate from the keypoints of your image: ie, match each keypoints to the closest delegate. See BOWImgDescriptorExtractor that perform extraction of descriptors from your keypoints and the matching to the vocabulary (possibly extracted from BOWTrainer). After that, you have a fixed-length characteristic vector: the number of delegate (normalized) for each object.

I provide a simple pseudo code here.

// Create the object for the vocabulary.
BOWKMeansTrainer bow( voca_size )
// This loop extract keypoints and descriptors from image.
// It stores descriptors in the BOW object.
// I assume each image only has one object
// Otherwise use ROI to restrict to the object
for_each( img : img_list )
{
    keypoints = ComputeSURFKeypoints( img )
    descriptors = ExtractSURFDescriptors( img, keypoints )
    bow.add( descriptors );
}
vocabulary = bow.cluster(); // Create the vocabulary with KMeans.
// Now, you have a vocabulary, compute the occurrence of delegate in object.
BOWImgDescriptorExtractor dextract( SURFExtractor, FlannMatching )
// Set the vocabulary
dextract.setVocabulary( vocabulary )
for_each( img : img_list )
{
    keypoints = ComputeSURFKeypoints( img )
    dextract.compute( img, keypoints, descriptors )
    trainData.push_back( descriptors ) // trainData is a vector-like for SVM.
    labels.push_back( object_idx ) // Store the label of the current object.
}
// Train your SVM with trainData
...

You should read the SURF paper in deep! Read at least the paper on BOW, and read other papers on objects recognition (and be sure you understand it), especially if you don't want to use BOW approach.