# Why does the CVSVM predict function does'nt work 100% on the same training set

I used the opencv CVSVM with bag of visual words to classify objects ,once i finish training i test the classifier on the same training set as this supposed give me 100% accuracy isn't it ? but that not the situation can someone explain why ? Here's the code

 //load some class_1 samples into matrix
//load the associated SVM parameter file into SVM
//test (predict)
Mat test_samples;
CvSVM classifier;

FileStorage fs ("train_sample/training_samples-1000.yaml.gz" , FileStorage::READ) ;
if ( !fs.isOpened() ){
cerr << "Cannot open file " << endl ;
return ;
}

string class_= "class_";
for ( size_t i = 1 ; i <= 24; i++ ){
stringstream ss ;
ss << class_ <<i ;
fs[ss.str()] >>test_samples;
size_t positive  =  0 ;
size_t negative = 0 ;
//test svm classifier that classify class_1 as positive and others as negative
for ( int i = 0 ; i < test_samples.rows ; i++ ){
float res = classifier.predict(test_samples.row(i),false  ) ;
( (res == 1) ? (positive++):(negative++) );

}

cout << ss.str() << " positive examples  = " <<positive <<" , negative examples =" << negative  << endl ;

}
fs.release();


The output

edit retag close merge delete

2

unrelated, but don't use 'i' as a loop variable 2 times in nested loops

( 2015-04-16 06:00:43 -0500 )edit

i want to retrain the svm with Cvalue = 10*10 and termcretiera 10*10 maybe i got better result than c = 1 and iteraion = 1000 I want to ask does the number of positive and negtive examples should be equal in general ,and in case of 1 vs n classes?

( 2015-04-16 06:11:46 -0500 )edit
2

not that i got any idea, but we know nothing about your svm params, the number of samples, if it's multi or single class, the vocabulary size, kind of train features, - hard to answer, without details

( 2015-04-16 06:27:08 -0500 )edit

i retrain the SVMS classifier with Cvalue = 10^10 and gama = 3 i got accuarcy 100% with same training examples and this is expected right ?

( 2015-04-16 06:34:49 -0500 )edit

Vocabulary size :1k ,number of training examples are vary ,from class 1-24 {110 , 14 ,19,14,11,21,10,15,15,15,14,19,45,14,18,45,42,67,70,29,25,26,46,34,82} NOTE:when i use the CVSVM.train_auto() with default parameter i got the above result ,In contrast when i use use CVSVM.train() with the following params svm_type: C_SVC kernel: { type:RBF, gamma:3. } C: 1.0000000000000000e+10 term_criteria: { epsilon:1.1920928955078125e-07, iterations:100000000 } all the Classifiers recognise the training examples 100%

( 2015-04-16 06:42:16 -0500 )edit
1

First: C=10^10 seems very high to me ^^. And to answer your question: SVM is an optimizer, it tries to minimize the error as much as possible, so no, it doesn't need to recognize the training examples by 100%.

( 2015-04-16 10:17:15 -0500 )edit
1

100 % with the trainset sounds ok ;)

still, try with unknown test images, and try to vary the svm params, rey a POLY kernel, C_SVCNU, nu=0.5

( 2015-04-16 10:34:02 -0500 )edit

on Friday,and saturday i will go to collect my test images and test the SVM . can you explain what the C and gamma arguments represent ? Also var_all , var_count ,what they represent?

Thanks you both guys.

( 2015-04-16 12:05:18 -0500 )edit
1

For C, see http://stats.stackexchange.com/questi... . gamma depends on the type of SVM you use, for linear SVMs there exist no gammavalue, for the other types have a look at the description of the kernel-type at http://docs.opencv.org/modules/ml/doc... , there gamma appears in the respective equations. var_count is probably the number of support vectors, var_all: I don't know which parameter you mean.

( 2015-04-17 03:05:49 -0500 )edit

Guys I don't know how many test images i need for testing ,could you suggest me please ?

( 2015-04-18 05:51:24 -0500 )edit