Why does the CVSVM predict function does'nt work 100% on the same training set

asked 2015-04-15 14:32:22 -0600

Abu Gaseem gravatar image

updated 2015-04-16 05:20:54 -0600

I used the opencv CVSVM with bag of visual words to classify objects ,once i finish training i test the classifier on the same training set as this supposed give me 100% accuracy isn't it ? but that not the situation can someone explain why ? Here's the code

 //load some class_1 samples into matrix
//load the associated SVM parameter file into SVM
//test (predict)
Mat test_samples;
CvSVM classifier;


FileStorage fs ("train_sample/training_samples-1000.yaml.gz" , FileStorage::READ) ;
if ( !fs.isOpened() ){
    cerr << "Cannot open file " << endl ;
    return ;
}

classifier.load("SVM_parameter_files/svm_1000_auto/SVM_classifier_class_1.yaml") ;
string class_= "class_";
for ( size_t i = 1 ; i <= 24; i++ ){
    stringstream ss ;
    ss << class_ <<i ;
    fs[ss.str()] >>test_samples;
    size_t positive  =  0 ;
    size_t negative = 0 ; 
    //test svm classifier that classify class_1 as positive and others as negative
    for ( int i = 0 ; i < test_samples.rows ; i++ ){
        float res = classifier.predict(test_samples.row(i),false  ) ;
       ( (res == 1) ? (positive++):(negative++) );

    }

    cout << ss.str() << " positive examples  = " <<positive <<" , negative examples =" << negative  << endl ;

}
fs.release();

The output class1 vs 24 class

edit retag flag offensive close merge delete

Comments

2

unrelated, but don't use 'i' as a loop variable 2 times in nested loops

berak gravatar imageberak ( 2015-04-16 06:00:43 -0600 )edit

i want to retrain the svm with Cvalue = 10*10 and termcretiera 10*10 maybe i got better result than c = 1 and iteraion = 1000 I want to ask does the number of positive and negtive examples should be equal in general ,and in case of 1 vs n classes?

Abu Gaseem gravatar imageAbu Gaseem ( 2015-04-16 06:11:46 -0600 )edit
2

not that i got any idea, but we know nothing about your svm params, the number of samples, if it's multi or single class, the vocabulary size, kind of train features, - hard to answer, without details

berak gravatar imageberak ( 2015-04-16 06:27:08 -0600 )edit

i retrain the SVMS classifier with Cvalue = 10^10 and gama = 3 i got accuarcy 100% with same training examples and this is expected right ?

Abu Gaseem gravatar imageAbu Gaseem ( 2015-04-16 06:34:49 -0600 )edit

Vocabulary size :1k ,number of training examples are vary ,from class 1-24 {110 , 14 ,19,14,11,21,10,15,15,15,14,19,45,14,18,45,42,67,70,29,25,26,46,34,82} NOTE:when i use the CVSVM.train_auto() with default parameter i got the above result ,In contrast when i use use CVSVM.train() with the following params svm_type: C_SVC kernel: { type:RBF, gamma:3. } C: 1.0000000000000000e+10 term_criteria: { epsilon:1.1920928955078125e-07, iterations:100000000 } all the Classifiers recognise the training examples 100%

Abu Gaseem gravatar imageAbu Gaseem ( 2015-04-16 06:42:16 -0600 )edit
1

First: C=10^10 seems very high to me ^^. And to answer your question: SVM is an optimizer, it tries to minimize the error as much as possible, so no, it doesn't need to recognize the training examples by 100%.

Guanta gravatar imageGuanta ( 2015-04-16 10:17:15 -0600 )edit
1

100 % with the trainset sounds ok ;)

still, try with unknown test images, and try to vary the svm params, rey a POLY kernel, C_SVCNU, nu=0.5

berak gravatar imageberak ( 2015-04-16 10:34:02 -0600 )edit

on Friday,and saturday i will go to collect my test images and test the SVM . can you explain what the C and gamma arguments represent ? Also var_all , var_count ,what they represent?

Thanks you both guys.

Abu Gaseem gravatar imageAbu Gaseem ( 2015-04-16 12:05:18 -0600 )edit
1

For C, see http://stats.stackexchange.com/questi... . gamma depends on the type of SVM you use, for linear SVMs there exist no gammavalue, for the other types have a look at the description of the kernel-type at http://docs.opencv.org/modules/ml/doc... , there gamma appears in the respective equations. var_count is probably the number of support vectors, var_all: I don't know which parameter you mean.

Guanta gravatar imageGuanta ( 2015-04-17 03:05:49 -0600 )edit

Guys I don't know how many test images i need for testing ,could you suggest me please ?

Abu Gaseem gravatar imageAbu Gaseem ( 2015-04-18 05:51:24 -0600 )edit