Ask Your Question

Kmeans and Bag of Generic Words

asked 2017-04-13 05:44:50 -0600

alexMm1 gravatar image

updated 2017-04-13 06:08:22 -0600

Hi guys,

i'm trying to implement my Bag of words with my own descriptor. I already did the kmeans part and it works perfectly. Now my question is which function in open CV I have to use to build the histogram and complete the training part of the code.

Consider that I used this code to compute the vocabulary:

Mat ComputeDictionary (Mat& features, const int K) {
int retries = 3;
int flags = KMEANS_PP_CENTERS;
Mat bestlabels, centers;
// K-Means function call    
kmeans(features, K, bestlabels, TermCriteria( TermCriteria::EPS+TermCriteria::COUNT,100,0.001), retries, flags, centers);
return centers;

Consider that my "features" matrix is a matrix with "number of samples * number of descriptors per sample " rows and "descriptor dimension" columns.

Now, I don't know how to proceed (I know the theory but I don't now how to implement it in c++)...could you please help me?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2017-04-13 06:55:28 -0600

berak gravatar image

updated 2017-04-13 06:58:17 -0600

hmm, opencv uses feature matching, to compute the histograms

for your custom descriptor, i guess, you have to implement something similar on your own.

alternatively to the histograms as final features, you could store distances to the centers, or residuals even.

edit flag offensive delete link more


so, basically, I have to this:

  1. build my own funtion to compute distances between each descriptor sample and the words of the vocabulary finding the minimum one.
  2. Compute the histogram
  3. Create the new feature matrix (samples x histogram dimension)
  4. Use SVM chi squared to do training.

then the same for a test sample.


alexMm1 gravatar imagealexMm1 ( 2017-04-13 07:33:16 -0600 )edit

good plan ;)

  1. there's norm(a,b) for the distance
  2. make a 1d Mat histogram(1, nclusters, CV_32F, 0,0f), then just increase the bin:<float>(0, closest_cluster_id) += 1; it's also a good idea to normalize() it in the end.
  3. start with an empty mat, and push_back() those histograms, one by one (on a single row)
  4. i'd still prefer LINEAR, but that's up to experimenting on your side !

for the testsample it would just need steps 1. and 2.

berak gravatar imageberak ( 2017-04-13 08:28:47 -0600 )edit

great thanks

alexMm1 gravatar imagealexMm1 ( 2017-04-14 04:38:14 -0600 )edit

Question Tools

1 follower


Asked: 2017-04-13 05:44:50 -0600

Seen: 241 times

Last updated: Apr 13 '17