# Kmeans algorithm stops without exception

Hi everyone,

I'm using OpenCV 2.4.3 in Ubuntu (12.10) in the following way:

Mat points = Mat(total_rows, 128, CV_32F, descritores);
Mat labels;
int clusterCount = k;
int dimensions = points.cols;
int sampleCount = points.rows;
int num_iterations = 2;
Mat centers(clusterCount, 128, CV_32F/*points.type()*/);
kmeans(points, clusterCount, labels, TermCriteria(CV_TERMCRIT_EPS + CV_TERMCRIT_ITER, num_iterations, 1.0), 10, KMEANS_PP_CENTERS, centers);


But suddenly the program stops when it is executing kmeans, without showing an exception or error. Descritores is a float pointer, and I verified that points is being initialized in a correct way. Points.cols = 128 and points.rows = 950173. My pc has 16 GB of ram, so I don't think is a memory problem. Could you please help me to understand the error? Is there any limit to the size of the data kmeans can handle? If some more details are needed, just ask me. Thanks.

edit retag close merge delete

I did a kmeans with more than 1M rows, so this should not be an issue. Did you try debugging step by step to see where the program exits?

( 2013-01-16 08:01:46 -0500 )edit

I debugged the code and the problem occurs in the line 2659 of the file matrix.cpp: box[j] = Vec2f(sample[j], sample[j]);

I don't understand why it is not working. These are the values of some variables in the kmeans function: N = 950173 isrow = false dims = 128 type = 5 attempts = 10 criteria.type = 3 criteria.epsilon = 1 criteria.maxCount = 100

Is the type (data.depth()) wrong? If it is the case, how can I proceed in order to get rid of this error?

( 2013-01-17 11:17:31 -0500 )edit

Sort by » oldest newest most voted

Hi, I tested the kmeas function with my data it is not giving any error ( i did not checked accuracy), But it is taking lng time… in fact very long. I have a suggestion not regarding the code but regarding the data you r data.

Now what I understand from you code is..

int dimensions = 950173;
int sampleCount = 128;


, Which means in your data you have 128 samples with each sample having a length of 950173? Practically it is very rare case and at such a higher dimension I personally don’t think any clustering algorithm works well.

If you really wanted to work on such a higher dimensionality, please use PCA to reduce the data to lower dimension or skip some of the features from your feature vector. I am sure you will get better results than operating on raw data.

more

No. Bruno mixed up rows and cols in the text. But the code says that the points matrix has 128 columns, which is the dimension.

( 2013-01-16 04:17:00 -0500 )edit

Sorry guys, Ben is right. In my text I mixed up rows and cols. So I have 950173 samples with a lenght of 128. I'm working with SIFT descriptors. I edited the post. I also put the centers variable initialization. I tested this code with a smaller data sample, and the algorithm worked as expected. But when I try the real sample, the program just stops.

( 2013-01-16 06:16:21 -0500 )edit

I discovered the problem. I changed the initialization of the two Mat variables (points and centers). Now I use the following way to initialize them:

Mat points = Mat(total_rows, 128, CV_32F, descritores).clone(); Mat centers = Mat(clusterCount, 128, CV_32F).clone();

I found this way in the Opencv 2.4 Cheat Sheet (docs.opencv.org/trunk/opencv_cheatsheet.pdf).

Thanks everybody.

( 2013-01-28 08:09:44 -0500 )edit

Official site

GitHub

Wiki

Documentation