1 | initial version |
I did not use KMean but I used PCA for my neural network training data to reduce features. It is in C++ interface of OpenCV. Let's start by reading csv file. My csv file is like :
im_path_1;label1
im_path_2;label2
So to read that csv file, my function :
void read_csv(const string& filename, vector<mat>& images, vector<int>& labels, char separator = ';') { std::ifstream file(filename.c_str(), ifstream::in); if (!file) { string error_message = "No valid input file was given, please check the given filename."; CV_Error(1, error_message); } string line, path, classlabel; while (getline(file, line)) { stringstream liness(line); getline(liness, path, separator); getline(liness, classlabel); if(!path.empty() && !classlabel.empty()) { Mat im = imread(path, 0); images.push_back(im); labels.push_back(atoi(classlabel.c_str())); } } }
It is holding data in vector of Mat variables. OpenCV's PCA requires data to be rolled as row vectors in a Mat variable. To do that :
Mat rollVectortoMat(const vector<Mat> &data)
{
Mat dst(static_cast<int>(data.size()), data[0].rows*data[0].cols, CV_32FC1);
for(unsigned int i = 0; i < data.size(); i++)
{
Mat image_row = data[i].clone().reshape(1,1);
Mat row_i = dst.row(i);
image_row.convertTo(row_i,CV_32FC1, 1/255.);
}
return dst;
}
A simple usage of this functions :
int main()
{
PCA pca;
vector<Mat> images_train;
vector<int> labels_train;
read_csv("train1k.txt",images_train,labels_train);
Mat rawTrainData = rollVectortoMat(images_train);
int pca_size = 500;
Mat trainData(rawTrainData.rows, pca_size,rawTrainData.type());
Mat testData(rawTestData.rows,pca_size,rawTestData.type());
pca(rawTrainData,Mat(),CV_PCA_DATA_AS_ROW,pca_size);
for(int i = 0; i < rawTrainData.rows ; i++)
pca.project(rawTrainData.row(i),trainData.row(i));
cout<<trainData.size()<<endl;
return 0;
}
trainData variable is the reduced version of the train set. And for pca_size variable; instead of using it as 500; you can give pca to 0.95 to retain %95 variance. I hope this helps for the PCA part. I used this reduced data to train a Neural Network.