Ask Your Question

What exactly clusterCount does?

asked 2014-06-28 12:30:47 -0600

thdrksdfthmn gravatar image

updated 2015-10-01 07:29:20 -0600

I am training a BOW classifier to classify some images, but I am not really sure what does cluster_count – Number of clusters to split the set by means. In fact what is the set: the image or the class, or the whole number of features? The number of clusters is defined on one image, on one class, or on all the features used for training?

edit retag flag offensive close merge delete



I think it's the number of words in the dictionary (the dictionary is obtained by using K means), but I'm not sure. Can you add a link to the documentation where it appears?

GilLevi gravatar imageGilLevi ( 2014-06-29 02:00:21 -0600 )edit

1 answer

Sort by » oldest newest most voted

answered 2014-06-29 17:26:13 -0600

Guanta gravatar image

I guess you mean cluster_count from the old C implementation of k-means, it's nothing else than the number of clusters to build via k-means (i.e. it's the same like the 'k'-parameter in the c++ or python interface) as GilLevi also pointed out. Btw. you can also use the BoWTrainer-interface: BowKMeansTrainer (, which effectively also runs k-means.

For bag of words you give all features from all training images as input (if you have too many features such that they don't fit all in your memory than take a random subset equally distributed for all images, for example you want 150 000 features from 150 images, you select 1000 descriptors randomly from each image).

edit flag offensive delete link more


Which number of features? For training the vocabulary you chose x feature vectors and the number of clusters k, typically they are not related to each other, of course you shouldn't select more cluster numbers than feature vectors. What means "750 descriptors more times", more than what? Of course if you don't need to reduce the number of features you can take all of them. Example: 100 images for training, in each you maybe have around 1000 feature vectors --> 100000 feature vectors, now you cluster them, e.g. by k-means with 1000 clusters. Then for your test set you compute your BoW-Descriptors using the trained vocabulary.

Guanta gravatar imageGuanta ( 2014-06-30 04:56:29 -0600 )edit

So, cluster_count ( that is the same as K) is the number of clusters for all the features/descriptors?

thdrksdfthmn gravatar imagethdrksdfthmn ( 2014-06-30 06:20:28 -0600 )edit

What I am asking is that kmeans is applied on the whole number of features from all the images used for training? BowKMeansTrainer has a parameter in its constructor named clusterCount.

thdrksdfthmn gravatar imagethdrksdfthmn ( 2014-06-30 06:24:33 -0600 )edit

Yes from all features. However note, that if you should use an independent training set to train your classifier (e.g. SVM). clusterCount == k.

Guanta gravatar imageGuanta ( 2014-06-30 06:44:47 -0600 )edit

Aha. And k has nothing to do with the number of (images) classes that I have, right?

thdrksdfthmn gravatar imagethdrksdfthmn ( 2014-06-30 06:56:03 -0600 )edit

Nope, only indirect: typically the more images the more diverse the feature set the better the clustering the better the classification (but this also depends on the dimension of the features etc.)

Guanta gravatar imageGuanta ( 2014-06-30 13:20:26 -0600 )edit

Question Tools


Asked: 2014-06-28 12:30:47 -0600

Seen: 703 times

Last updated: Oct 01 '15