Low accuracy of SVM on Android

asked 2018-11-09 09:32:20 -0600

caiocanalli gravatar image

Hello guys,

I have an Android project that uses the face detection feature (Cascade Classifier). After recognizing the face, I trim the eyes and use a descriptor (Kaze) in conjunction with Bag of Words and SVM to recognize open and closed eyes, left and right. I performed the SVM training with a set of 1600 images, 400 of each type.

The training is done in C#, and after finishing the training, I test with some images, and all are classified correctly.

However, when importing the resulting dictionary and svm files, using the camera results have been very bad.

Here are the images and files generated through training:

https://github.com/caiocanalli/OpenCV...

Attached, the C#, Java, and C ++ code I'm using.

Any suggestion of how to improve preaching and welcome.

Thank you.

public class Training
{
    private const string Path = "C:/Users/Administrator/Desktop/Data";

    private readonly KAZE _kaze;
    private readonly BFMatcher _bfMatcher;
    private readonly BOWKMeansTrainer _bowKMeansTrainer;
    private readonly BOWImgDescriptorExtractor _bowImgDescriptorExtractor;
    private readonly Dictionary<int, string> _images;

    private Mat _kazeDescriptors;
    private Mat _bowDescriptors;

    private Matrix<int> label;

    public Training()
    {
        int dictionarySize = 10000;

        _kaze = new KAZE(true, true, 0.00001F);
        _bfMatcher = new BFMatcher(DistanceType.L2);

        _bowKMeansTrainer = new BOWKMeansTrainer(
            dictionarySize,
            new MCvTermCriteria(200, 0.00001),
            1,
            KMeansInitType.PPCenters);

        _bowImgDescriptorExtractor =
            new BOWImgDescriptorExtractor(_kaze, _bfMatcher);

        _kazeDescriptors = new Mat();
        _bowDescriptors = new Mat();

        _images = new Dictionary<int, string>
        {
            { 1, "/closedLeftEyes" },
            { 2, "/openLeftEyes" },

            //{ 1, "/closedRightEyes" },
            //{ 2, "/openRightEyes" }
        };
    }

    public void Start()
    {
        // Compute KAZE descriptors

        System.Console.WriteLine("Compute KAZE descriptors...");

        foreach (var item in _images)
        {
            var files = Directory.GetFiles(
                Path + item.Value, "*.png");

            System.Console.WriteLine("Directory: " + item.Value + ". Size: " + files.Length);

            foreach (var file in files)
            {
                var image = new Image<Gray, byte>(file);

                Mat descriptors = new Mat();
                MKeyPoint[] keyPoints = _kaze.Detect(image);

                _kaze.Compute(image,
                    new VectorOfKeyPoint(keyPoints), descriptors);

                _kazeDescriptors.PushBack(descriptors);

                System.Console.WriteLine("Imagem: " + file);
            }
        }

        _bowKMeansTrainer.Add(_kazeDescriptors);

        // Cluster dictionary

        System.Console.WriteLine("Cluster dictionary...");

        Mat dictionary = new Mat();
        _bowKMeansTrainer.Cluster(dictionary);

        _bowImgDescriptorExtractor.SetVocabulary(dictionary);

        FileStorage fs = new FileStorage(
            "dictionary_left.xml", FileStorage.Mode.Write);
        fs.Write(dictionary, "dictionary");
        fs.ReleaseAndGetString();

        //FileStorage fs = new FileStorage(
        //    "dictionary_right.xml", FileStorage.Mode.Write);
        //fs.Write(dictionary, "dictionary");
        //fs.ReleaseAndGetString();

        // Compute BOW descriptors

        System.Console.WriteLine("Compute BOW descriptors...");

        var labels = new List<int>();

        foreach (var item in _images)
        {
            var files = Directory.GetFiles(
                Path + item.Value, "*.png");

            System.Console.WriteLine("Directory: " + item.Value + ". Size: " + files.Length);

            foreach (var file in files)
            {
                var image = new Image<Gray, byte>(file);

                Mat descriptors = new Mat();
                MKeyPoint[] keyPoints = _kaze.Detect(image);

                _bowImgDescriptorExtractor.Compute(image,
                    new VectorOfKeyPoint(keyPoints), descriptors);

                _bowDescriptors.PushBack(descriptors);
                labels.Add(item.Key);

                System.Console.WriteLine("Image: " + file);
            }
        }

        label = new Matrix<int>(labels.ToArray());

        // Train SVM

        System.Console.WriteLine("Train SVM...");

        SVM svm = new SVM();
        svm.SetKernel(SVM.SvmKernelType.Rbf);
        svm.Type = SVM.SvmType.CSvc;

        svm.TermCriteria = new MCvTermCriteria(400, 0.00001);

        TrainData trainData = new TrainData(
            _bowDescriptors, 
            Emgu.CV.ML.MlEnum.DataLayoutType.RowSample,
            label);

        System.Console.WriteLine("C: " + svm.C + " Gamma: " + svm.Gamma);

        bool result = svm.TrainAuto(trainData);

        System.Console.WriteLine("C: " + svm.C + " Gamma: " + svm.Gamma);

        svm.Save("svm_left.xml");
        //svm.Save("svm_right.xml");

        if (!result)
            throw new ...
(more)
edit retag flag offensive close merge delete

Comments

  • we can't help you with anything c#, so that wall of code there is somewhat redundant / irrelevant
  • 10000 features, but only 800 images (per side)
  • the train images seem to come from some database, and probably don't fit your real life situation
  • an RBF SVM kernel is not always the right answer, also there are params to tweak
  • your data isn't normalized
  • VLAD is an improvement on BOW, also HOG
  • no, android is not the problem. the misfit between training and inference data is
  • there are much better eye detectors, than the cascade based ones out there (anything landmarks related)
berak gravatar imageberak ( 2018-11-09 09:50:35 -0600 )edit

... and STOP using #include <opencv/cv.h> , please ;)

berak gravatar imageberak ( 2018-11-09 09:55:15 -0600 )edit

Hi again @berak. Always helping me rsrs.

10000 features, but only 800 images (per side) Actually, I've tried several values. I already searched the ideal number, but I did not find any answer. If so, do you have any suggestion of value for this number of samples?

the train images seem to come from some database, and probably don't fit your real life situation I cut out the images from the web, from various sources. I resized each image and took care not to crop repeated images. A lot of work =/. Actually, I tried to use the set you suggested here, but the precision of it was worse.

your data isn't normalized Sorry for my ignorance, but what do you suggest here?

caiocanalli gravatar imagecaiocanalli ( 2018-11-09 11:20:24 -0600 )edit
1

VLAD is an improvement on BOW, also HOG My main fear is that I'm implementing BoW incorrectly.

... and STOP using #include <opencv cv.h=""> , please ;) It has already been removed. Not being used =)

caiocanalli gravatar imagecaiocanalli ( 2018-11-09 11:22:53 -0600 )edit

My main fear is that I'm implementing BoW incorrectly.

i can't see much wrong in the code here. but maybe it's not such a great improvement, as you hoped ;(

your data isn't normalized

try to (L2 or minmax) normalize each feature in the train and test data to [0..1], that is:

 normalize(trainData.row(i), trainData.row(i));
berak gravatar imageberak ( 2018-11-09 11:30:26 -0600 )edit