Ask Your Question

How exactly does BoVW work for Python-3 Open cv3?

asked 2018-01-31 02:02:40 -0500

RSSharma gravatar image

updated 2018-01-31 03:49:32 -0500

berak gravatar image

So, I wrote a code for a SIFT/SURF+BOVW+SVM Classifier for 20 kinds of texture in Python. In method train(), I extract SIFT/SURF feature descriptors for every image in my training set, and I have created a BOWKMeansTrainer as follows:

dictionarySize = 20

BOW = cv.BOWKMeansTrainer(dictionarySize)

I have 80 training images.

So, I add the descriptors of each image to BOW like: kp,desc=surf(img) BOW.add(desc)

All of these form a dictionary with size (20,64). What does this 64 mean?

And, how does the BOW trainer know that some image x belongs to class c? What data do I feed into the SVM?

Help in understanding this would be greatly appreciated. Thanks

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2018-01-31 03:47:13 -0500

berak gravatar image

All of these form a dictionary with size (20,64). What does this 64 mean?

64 is the size of a single SIFT or SURF descriptor, and your dictionary has 20 of those. all correct, so far ! (though it might need a few more than 20 for a good classification)

BagOfWords classification will use a signature of 20 elements (in your case), a histogram, each bin counts, which dictionary feature was matched by one of your image features.

the next steps will be:

  1. make BoW signatures from your train images, using BOWImgDescriptorExtractor
  2. train an SVM (or Knn or ANN) on those
  3. also make BoW signatures for your test images
  4. make a prediction for your test signatures

# 1. 
sift = cv2.xfeatures2d.SIFT_create()
flann_params = dict(algorithm = 1, trees = 5)
matcher = cv2.FlannBasedMatcher(flann_params, {}) 
bow_extract = cv2.BOWImgDescriptorExtractor( sift , matcher )
bow_extract.setVocabulary( voc ) # the 64x20 dictionary, you made before

traindata = []
trainlabels = []
#for each train image:
      # get keypoints
      siftkp = sift.detect(img)
      # let the bow extractor find descriptors, and match them to the dictionary
      bowsig = bow_extract.compute(im, siftkp)
      traindata.extend( bowsig )
      trainlabels.append( class_id_of_img ) # a number, from 0 to 20

# 2. create & train the svm
svm =
svm.train(np.array(traindata),, np.array(trainlabels))

# 3. for each test image, you have to repeat the steps from above:
siftkp = sift.detect(img)
bowsig = bow_extract.compute(im, siftkp)

# 4. now you can predict the classid of your img
# (one of the numbers you passed in for the trainlabels):
p = svm.predict(bowsig)
edit flag offensive delete link more


Thanks for the detailed description.

RSSharma gravatar imageRSSharma ( 2018-01-31 19:16:20 -0500 )edit

How you label the data?

Sky31 gravatar imageSky31 ( 2019-01-19 09:26:30 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-01-31 02:02:40 -0500

Seen: 1,285 times

Last updated: Jan 31 '18