I am using OpenCV letter_recog.cpp example to experiment on random trees and other classifiers. This example has implementations of six classifiers - random trees, boosting, MLP, kNN, naive Bayes and SVM. UCI letter recognition dataset with 20000 instances and 16 features is used, which I split in half for training and testing. I have experience with SVM so I quickly set its recognition error to 3.3%. After some experimentation what I got was:
UCI letter recognition:
- RTrees - 5.3%
- Boost - 13%
- MLP - 7.9%
- kNN(k=3) - 6.5%
- Bayes - 11.5%
- SVM - 3.3%
Parameters used:
RTrees - max_num_of_trees_in_the_forrest=200, max_depth=20, min_sample_count=1
Boost - boost_type=REAL, weak_count=200, weight_trim_rate=0.95, max_depth=7
MLP - method=BACKPROP, param=0.001, max_iter=300 (default values - too slow to experiment)
kNN(k=3) - k=3
Bayes - none
SVM - RBF kernel, C=10, gamma=0.01
After that I used same parameters and tested on Digits and MNIST datasets by extracting gradient features first (vector size 200 elements):
Digits:
- RTrees - 5.1%
- Boost - 23.4%
- MLP - 4.3%
- kNN(k=3) - 7.3%
- Bayes - 17.7%
- SVM - 4.2%
MNIST:
- RTrees - 1.4%
- Boost - out of memory
- MLP - in progress
- kNN(k=3) - 1.2%
- Bayes - 34.33%
- SVM - 0.6%
I am new to all classifiers except SVM and kNN, for these two I can say the results seem fine. What about others? I expected more from random trees, on MNIST kNN gives better accuracy, any ideas how to get it higher? Boost and Bayes give very low accuracy. In the end I'd like to use these classifiers to make a multiple classifier system. Any advice?