Revision history [back]

All of these form a dictionary with size (20,64). What does this 64 mean?

64 is the size of a single SIFT or SURF descriptor, and your dictionary has 20 of those. all correct, so far ! (though it might need a few more than 20 for a good classification)

BagOfWords classification will use a signature of 20 elements (in your case), a histogram, each bin counts, which dictionary feature was matched by one of your image features.

the next steps will be:

make BoW signatures from your train images, using BOWImgDescriptorExtractor
train an SVM (or Knn or ANN) on those
also make BoW signatures for your test images
make a prediction for your test signatures

# 1. 
sift = cv2.xfeatures2d.SIFT_create()
flann_params = dict(algorithm = 1, trees = 5)
matcher = cv2.FlannBasedMatcher(flann_params, {}) 
bow_extract = cv2.BOWImgDescriptorExtractor( sift , matcher )
bow_extract.setVocabulary( voc ) # the 64x20 dictionary, you made before

traindata = []
trainlabels = []
#for each train image:
      # get keypoints
      siftkp = sift.detect(img)
      # let the bow extractor find descriptors, and match them to the dictionary
      bowsig = bow_extract.compute(im, siftkp)
      traindata.extend( bowsig )
      trainlabels.append( class_id_of_img ) # a number, from 0 to 20

# 2. create & train the svm
svm = cv2.ml.SVM_create()
svm.train(np.array(traindata), cv2.ml.ROW_SAMPLE, np.array(trainlabels))

# 3. for each test image, you have to repeat the steps from above:
siftkp = sift.detect(img)
bowsig = bow_extract.compute(im, siftkp)

# 4. now you can predict the classid of your img
# (one of the numbers you passed in for the trainlabels):
p = svm.predict(bowsig)