First time here? Check out the FAQ!

Ask Your Question
0

Kmeans and Bag of Generic Words

asked Apr 13 '17

alexMm1 gravatar image

updated Apr 13 '17

Hi guys,

i'm trying to implement my Bag of words with my own descriptor. I already did the kmeans part and it works perfectly. Now my question is which function in open CV I have to use to build the histogram and complete the training part of the code.

Consider that I used this code to compute the vocabulary:

Mat ComputeDictionary (Mat& features, const int K) {
int retries = 3;
int flags = KMEANS_PP_CENTERS;
Mat bestlabels, centers;
// K-Means function call    
kmeans(features, K, bestlabels, TermCriteria( TermCriteria::EPS+TermCriteria::COUNT,100,0.001), retries, flags, centers);
return centers;
}

Consider that my "features" matrix is a matrix with "number of samples * number of descriptors per sample " rows and "descriptor dimension" columns.

Now, I don't know how to proceed (I know the theory but I don't now how to implement it in c++)...could you please help me?

Preview: (hide)

1 answer

Sort by » oldest newest most voted
2

answered Apr 13 '17

berak gravatar image

updated Apr 13 '17

hmm, opencv uses feature matching, to compute the histograms

for your custom descriptor, i guess, you have to implement something similar on your own.

alternatively to the histograms as final features, you could store distances to the centers, or residuals even.

Preview: (hide)

Comments

so, basically, I have to this:

  1. build my own funtion to compute distances between each descriptor sample and the words of the vocabulary finding the minimum one.
  2. Compute the histogram
  3. Create the new feature matrix (samples x histogram dimension)
  4. Use SVM chi squared to do training.

then the same for a test sample.

right?

alexMm1 gravatar imagealexMm1 (Apr 13 '17)edit
1

good plan ;)

  1. there's norm(a,b) for the distance
  2. make a 1d Mat histogram(1, nclusters, CV_32F, 0,0f), then just increase the bin: hist.at<float>(0, closest_cluster_id) += 1; it's also a good idea to normalize() it in the end.
  3. start with an empty mat, and push_back() those histograms, one by one (on a single row)
  4. i'd still prefer LINEAR, but that's up to experimenting on your side !


for the testsample it would just need steps 1. and 2.

berak gravatar imageberak (Apr 13 '17)edit

great thanks

alexMm1 gravatar imagealexMm1 (Apr 14 '17)edit

Question Tools

1 follower

Stats

Asked: Apr 13 '17

Seen: 299 times

Last updated: Apr 13 '17