How can I use PCA properly?
I'd like to implement the character recognition. The implementation of LBP pattern and spatial histogram is fine for me, but I face some problems when doing the PCA process.
I've looked for the internet resources, but cannot solve my difficulty. As for my code, the size of spatial_histogram is 116384 (rowcol) . But after the projection, the size of projection_result is only 1*1. My purpose is to reduce the dimension of the obtained spatial histogram into smaller size, so that I can then port the result with smaller size into SVM for training. How can I make it turn the size of feature vector of my spatial histogram into 200?
Besides, I have seen a lot about PCA's function like "project", "backproject", but I'm quite confused about that.
Here's my piece of code:
Mat inImg = imread(filename,0);
Mat lbp_image(inImg.rows-2, inImg.rows-2, CV_8UC1, Scalar(0));
int radius = 1;
int neighbors = 8;
int grid_x = 8, grid_y = 8;
olbp(inImg, lbp_image);
Mat histMat = spatial_histogram(
lbp_image,
static_cast<int>(std::pow(2.0, static_cast<double>(neighbors))),
grid_x,
grid_y,
true);
histograms.push_back(histMat);
PCA pca(histMat,Mat(),CV_PCA_DATA_AS_ROW, 200); // histMat <- 1*16384
pca.project(histMat,projection_result);
you can't make a PCA from a single histogram. you have to stack all your histograms in training to a N(histograms) x M(features) Mat to make the pca.
then, for prediction later, you project a single histogram using that pca (to a short, 200 elem fature vector)
Thanks for your answering @berak. But I don't quite get your meaning. I think your meaning is that the spatial_histogram function generates a single histogram for a single type of character like 'A', and then do (let say) 30 more samples of character A. Then, 30 * 16384 histogram mat is generated, where each row represents a character sample. After that, put that Mat to a PCA to project. But what I want is to use PCA to reduce the dimension of histogram vector from 1 * 16384 to 1 * 200 or more, and then generate a 30 * 200 Mat to form a training data for SVM to train. Is my idea right?
you need much more data to make it work (like a few 1000).
to retain 200 eigenvectors in a pca, you need more than 200 rows of data.
I think I understand what I can do now. I appreciate your help!