Ask Your Question

How can I use PCA properly?

asked 2015-06-08 06:04:17 -0500

cv_new gravatar image

I'd like to implement the character recognition. The implementation of LBP pattern and spatial histogram is fine for me, but I face some problems when doing the PCA process.

I've looked for the internet resources, but cannot solve my difficulty. As for my code, the size of spatial_histogram is 116384 (rowcol) . But after the projection, the size of projection_result is only 1*1. My purpose is to reduce the dimension of the obtained spatial histogram into smaller size, so that I can then port the result with smaller size into SVM for training. How can I make it turn the size of feature vector of my spatial histogram into 200?

Besides, I have seen a lot about PCA's function like "project", "backproject", but I'm quite confused about that.

Here's my piece of code:

Mat inImg = imread(filename,0);
        Mat lbp_image(inImg.rows-2, inImg.rows-2, CV_8UC1, Scalar(0));      
        int radius = 1;
        int neighbors = 8;
        int grid_x = 8,  grid_y = 8;
        olbp(inImg, lbp_image);

        Mat histMat = spatial_histogram(
            static_cast<int>(std::pow(2.0, static_cast<double>(neighbors))),            


        PCA pca(histMat,Mat(),CV_PCA_DATA_AS_ROW, 200);  // histMat <-  1*16384
edit retag flag offensive close merge delete


you can't make a PCA from a single histogram. you have to stack all your histograms in training to a N(histograms) x M(features) Mat to make the pca.

then, for prediction later, you project a single histogram using that pca (to a short, 200 elem fature vector)

berak gravatar imageberak ( 2015-06-09 00:00:38 -0500 )edit

Thanks for your answering @berak. But I don't quite get your meaning. I think your meaning is that the spatial_histogram function generates a single histogram for a single type of character like 'A', and then do (let say) 30 more samples of character A. Then, 30 * 16384 histogram mat is generated, where each row represents a character sample. After that, put that Mat to a PCA to project. But what I want is to use PCA to reduce the dimension of histogram vector from 1 * 16384 to 1 * 200 or more, and then generate a 30 * 200 Mat to form a training data for SVM to train. Is my idea right?

cv_new gravatar imagecv_new ( 2015-06-09 01:18:54 -0500 )edit

you need much more data to make it work (like a few 1000).

to retain 200 eigenvectors in a pca, you need more than 200 rows of data.

berak gravatar imageberak ( 2015-06-09 01:23:47 -0500 )edit

I think I understand what I can do now. I appreciate your help!

cv_new gravatar imagecv_new ( 2015-06-09 01:36:46 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2015-06-08 08:26:30 -0500

It seems to me that you are saying you have a 1 row and many column's data => so data are in columns, not in row (I'm not sure because of the formatting in your question). So you should probably use CV_PCA_DATA_AS_COL instead of ROW, right?

edit flag offensive delete link more


I've tried CV_PCA_DATA_AS_COL before, but it throws exceptions at the following code. What should I do about it?

 if( mean.rows == 1 )
    gemm( tmp_data, eigenvectors, 1, Mat(), 0, result, GEMM_2_T );

When debugging into it, it stops at this line. I think it is caused by the size of both matrix not matching to each other when doing the multiplication. The size of eignenvector is only 1*1. Thanks for answering.

CV_Assert( a_size.width == len );     // where a_size.width = 16384 , len =1
cv_new gravatar imagecv_new ( 2015-06-08 22:07:19 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2015-06-08 06:04:17 -0500

Seen: 284 times

Last updated: Jun 08 '15