Using PCA and LDA for dimensionality reduction for SVM

asked 2016-08-27 03:52:33 -0600

Angulu gravatar image

updated 2016-08-27 03:56:51 -0600

I am preparing data for training an SVM. I use PCA to reduce dimensionality of data before using LDA for class discriminant dimensionality reduction. I then feed reduced data projected into LDA subspace to SVM as shown in code below.

    Mat trainData; //Hold data for training. Each row is a sample    
    vector<Mat> histograms; //Contains row histograms of LBP features
    convertVectorToMat(histograms, trainData); //Convert vector of Mat to Mat (40 rows, 4096 columns)
    PCA pca(trainData, Mat(), PCA::DATA_AS_ROW, (classes - 1));//PCA gives (40 rows, 39 columns)
    Mat mean = pca.mean.reshape(1, 1);

    //Project data to PCA feature space
    Mat projection = pca.project(trainData);

    //Perform LDA on data projected on PCA feature space
    LDA lda((classes - 1));
    lda.compute(projection, labels);
    Mat_<float> ldaProjected = lda.project(projection);
    normalize(ldaProjected, ldaProjected, 0, 1, NORM_MINMAX, CV_32FC1);

I am passing Mat ldaProjected to SVM together with corresponding labels for training. My question is am I doing it right or I should have passed Mat projection to SVM. In whichever case, SVM is only giving same class label for any sample I predict. Kindly advice if am preparing my data well for training. I intended to use LDA for dimensionality reduction for training multi-class SVM.

edit retag flag offensive close merge delete

Comments

technically, your code is correct. but with some more filters in the pipeline now, you probably have to adjust your SVM params.

berak gravatar imageberak ( 2016-08-28 00:47:02 -0600 )edit

This is how am training SVM

void trainSVM(Mat hists, vector<int> labels){
    Ptr<TrainData> trainData = TrainData::create(hists, ml::ROW_SAMPLE, labels);
    Ptr<SVM> svm = SVM::create();
    svm->setKernel(SVM::LINEAR);
    svm->setType(SVM::C_SVC);//For n-class classification problem with imperfect class separation
    svm->train(trainData);
}

Kindly advice me if am setting SVM params well. I intent to perform classification of images from 40 different classes

Angulu gravatar imageAngulu ( 2016-08-28 02:21:45 -0600 )edit

since you're using C_SVC, try:

svm->setC(C);

with C = 0.1, 1, 10, 100, 500, 1000

berak gravatar imageberak ( 2016-08-28 02:33:55 -0600 )edit

also,

svm->trainAuto()
berak gravatar imageberak ( 2016-08-28 02:34:37 -0600 )edit

I have tried with both methods still not fine. Could I be preparing my data wrongly? Because when I look at my trained model, I have got 3304 Support Vectors but all populated with 0. I am extracting LBP features from images then push LBP Histograms to a vector of Mat. Then I am converting this vector<Mat> histograms to Mat trainData as shown in the code below. Kindly advice if my logic is OK

Angulu gravatar imageAngulu ( 2016-08-28 03:04:56 -0600 )edit

This is how am Converting from vector<Mat> histogram to Mat trainData

Void convertToMat(vector<Mat> &samples, Mat &trainData){
    int rows = samples.size();
    int cols = max(samples[0].cols, samples[0].rows);
    Mat tmp(1, cols, CV_32FC1); //used for transposition if needed
    trainData = Mat(rows, cols, CV_32FC1);
    vector< Mat >::const_iterator itr = samples.begin();
    vector< Mat >::const_iterator end = samples.end();
    for (int i = 0; itr != end; ++itr, ++i){
        CV_Assert(itr->cols == 1 || itr->rows == 1);
        if (itr->cols == 1){
            transpose(*(itr), tmp);
            tmp.copyTo(trainData.row(i));
        }
        else if (itr->rows == 1){
            itr->copyTo(trainData.row(i));
        }
    }
}
Angulu gravatar imageAngulu ( 2016-08-28 03:07:16 -0600 )edit

Even in my LDA Model, eigenvectors consist of so many 0s and few 1s only, no any other value in the XML file. Kindly advice on that logic of converting from vector<Mat> to Mat and generally how to prepare a training matrix. Thank you

Angulu gravatar imageAngulu ( 2016-08-28 03:16:09 -0600 )edit
1

again imho, "technically", your code is ok.

no idea, but somehow, i'd try first without the PCA/LDA. also with more data, 40 train items might just not be enough.

ohhh, wait, that means, you only got 1 sample per class ? that's just bad.

again, needs mode data, imho.

berak gravatar imageberak ( 2016-08-28 03:18:41 -0600 )edit

Thank you berak. I have more samples per class (Atleast 4 and atmost 7). Should I use same number of images per class? For 40 classes, I have a total of 220 images am using to train SVM.

Angulu gravatar imageAngulu ( 2016-08-28 04:13:22 -0600 )edit
1

it won't matter much if you have 5 for one class, and 7 for another. just try to keep it halfway balanced, and use as many as you can.

berak gravatar imageberak ( 2016-08-28 04:26:55 -0600 )edit