Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Matching kmeans clusters generating non random sequence

I've been creating a bag of words based texture classifier using gaussian filterbanks. I've recently found a fairly fundamental flaw in that after i collect and save a set of 'model' histograms from training images, if i then generate another histogram from from one of the training images using an identical procedure and use compareHist with chisquared to match it doesn't give a perfect match but instead a set of seemingly random distances which reoccur exactly if it's the process is repeated from new.

I've done this in a loop (generating a histogram and matching to a save histogram of the same image), and example of the distances comparehist throws back is below.

{4,6,6,4,3,4,5,6,4,6,5,3,3,5,6,5,6,5,6,4,5,5,4,5,4,4,6,3,5}

I cant understand why the distance:

  • isn't zero
  • but also is identical each time i repeat it

I'm using the bag of words trainer to generate the clusters with KMEANS_PP_CENTERS being used to calculate the initial centres. Then comparing those clusters with chisquared.

Is this something which could be due to my code or from the clustering?

Thank you in advance this has been driving me crazy and my dissertation is due in a week and a half so stressful..

Note this is partial repost from my other post here but that's mainly because i need an answer pretty quick because it's holding up my project. Thanks

Matching kmeans clusters generating non random sequence

I've been creating a bag of words based texture classifier using gaussian filterbanks. I've recently found a fairly fundamental flaw in that after i collect and save a set of 'model' histograms from training images, if i then generate another histogram from from one of the training images using an identical procedure and use compareHist with chisquared to match it doesn't give a perfect match but instead a set of seemingly random distances which reoccur exactly if it's the process is repeated from new.

I've done this in a loop (generating a histogram and matching to a save histogram of the same image), and example of the distances comparehist throws back is below.

{4,6,6,4,3,4,5,6,4,6,5,3,3,5,6,5,6,5,6,4,5,5,4,5,4,4,6,3,5}

I cant understand why the distance:

  • isn't zero
  • but also is identical each time i repeat it

I'm using the bag of words trainer to generate the clusters with KMEANS_PP_CENTERS being used to calculate the initial centres. Then comparing those clusters with chisquared.

Is this something which could be due to my code or from the clustering?

Thank you in advance this has been driving me crazy and my dissertation is due in a week and a half so stressful..

Note this is partial repost from my other post here but that's mainly because i need an answer pretty quick because it's holding up my project. Thanks

Below is a basic example of the kmeans variation i'm talking about, although not my specific problem:

int Flags = KMEANS_PP_CENTERS;
TermCriteria Tc(TermCriteria::MAX_ITER + TermCriteria::EPS, 1000, 0.0001);

float histArr[] = {0,255};
const float* hist= {histArr};
int histSize[] = {10};
int channels[] = {0};
Mat ou1;
namedWindow("testWin", CV_WINDOW_AUTOSIZE);
vector<Mat> compareMe;
for(int i=0;i<2;i++){
 BOWKMeansTrainer tstTrain(30, Tc, 5, Flags);
 Mat img1 = imread("../lena.png", CV_LOAD_IMAGE_GRAYSCALE);
 imshow("testWin", img1);
 waitKey(1000);
 // filterHandle(img1, imgOut, filterbank, n_sigmas, n_orientations);
 cout << "This is the size.." << img1.rows << " cols: " << img1.cols << endl;
 Mat imgFlat = reshapeCol(img1);
 cout << "This is the size.." << imgFlat.rows << " cols: " << imgFlat.cols << endl;
 tstTrain.add(imgFlat);
 Mat clusters = Mat::zeros(10,1, CV_32FC1);
 clusters = tstTrain.cluster();
 calcHist(&clusters, 1, channels, Mat(), ou1, 1, histSize, &histArr, true, false);
 compareMe.push_back(ou1);
 tstTrain.clear();
 cout << "This is the tstTrain.size(): " << tstTrain.descripotorsCount() << endl;
}
double value =  compareHist(compareMe[0], compareMe[1], CV_COMP_CHISQR);
cout << "This is the Chisqr comparison.." << value << endl;
compareMe.clear();

Below is the resize function:

Mat reshapeCol(Mat in){
  Mat points(in.rows*in.cols, 1,CV_32F);
  int cnt = 0;
  cout << "inside. These are the rows: " <<  in.rows << " and cols: " << in.cols  << endl;
  for(int i =0;i<in.cols;i++){
    for(int j=0;j<in.rows;j++){
      points.at<float>(cnt, 0) = in.at<Vec3b>(i,j)[0];
      cnt++;
    }
  }
  return points;
}

This is my github, my specific problem is both the variation when i generate models or test images(novelImgTest.cpp). Also for whatever reason if i add more images(even duplicates) to the testing folder then some of the images loaded and tested after change in TPR and PPV, even though that change had no relation to the duplicate. I'm completely stumped on this..