# How can I use PCA properly?

I'd like to implement the character recognition. The implementation of LBP pattern and spatial histogram is fine for me, but I face some problems when doing the PCA process.

I've looked for the internet resources, but cannot solve my difficulty. As for my code, the size of spatial_histogram is 116384 (rowcol) . But after the projection, the size of projection_result is only 1*1. My purpose is to reduce the dimension of the obtained spatial histogram into smaller size, so that I can then port the result with smaller size into SVM for training. How can I make it turn the size of feature vector of my spatial histogram into 200?

Besides, I have seen a lot about PCA's function like "project", "backproject", but I'm quite confused about that.

Here's my piece of code:

Mat inImg = imread(filename,0);
Mat lbp_image(inImg.rows-2, inImg.rows-2, CV_8UC1, Scalar(0));
int neighbors = 8;
int grid_x = 8,  grid_y = 8;
olbp(inImg, lbp_image);

Mat histMat = spatial_histogram(
lbp_image,
static_cast<int>(std::pow(2.0, static_cast<double>(neighbors))),
grid_x,
grid_y,
true);

histograms.push_back(histMat);

PCA pca(histMat,Mat(),CV_PCA_DATA_AS_ROW, 200);  // histMat <-  1*16384
pca.project(histMat,projection_result);

edit retag close merge delete

you can't make a PCA from a single histogram. you have to stack all your histograms in training to a N(histograms) x M(features) Mat to make the pca.

then, for prediction later, you project a single histogram using that pca (to a short, 200 elem fature vector)

( 2015-06-09 00:00:38 -0500 )edit

Thanks for your answering @berak. But I don't quite get your meaning. I think your meaning is that the spatial_histogram function generates a single histogram for a single type of character like 'A', and then do (let say) 30 more samples of character A. Then, 30 * 16384 histogram mat is generated, where each row represents a character sample. After that, put that Mat to a PCA to project. But what I want is to use PCA to reduce the dimension of histogram vector from 1 * 16384 to 1 * 200 or more, and then generate a 30 * 200 Mat to form a training data for SVM to train. Is my idea right?

( 2015-06-09 01:18:54 -0500 )edit

you need much more data to make it work (like a few 1000).

to retain 200 eigenvectors in a pca, you need more than 200 rows of data.

( 2015-06-09 01:23:47 -0500 )edit

I think I understand what I can do now. I appreciate your help!

( 2015-06-09 01:36:46 -0500 )edit

Sort by » oldest newest most voted

It seems to me that you are saying you have a 1 row and many column's data => so data are in columns, not in row (I'm not sure because of the formatting in your question). So you should probably use CV_PCA_DATA_AS_COL instead of ROW, right?

more

I've tried CV_PCA_DATA_AS_COL before, but it throws exceptions at the following code. What should I do about it?

 if( mean.rows == 1 )
gemm( tmp_data, eigenvectors, 1, Mat(), 0, result, GEMM_2_T );


When debugging into it, it stops at this line. I think it is caused by the size of both matrix not matching to each other when doing the multiplication. The size of eignenvector is only 1*1. Thanks for answering.

CV_Assert( a_size.width == len );     // where a_size.width = 16384 , len =1

( 2015-06-08 22:07:19 -0500 )edit

Official site

GitHub

Wiki

Documentation