SVM and Hog Features: how to use them in a combined manner for a precise object detection [closed]
Hello Forum members,
I have worked a lot with haar-training and train-cascade to create cascade xml files.But these methods are time consuming and most of the times get stuck.
Currently I am researching on object training and detection using Support vector machine(SVM). I have 5781 images of which 1761 are positives. Firstly I did the training simply on the basis of intensity values. But the results were simply horrible. The trained XML file when used was just able to detect positives and negatives from the training image set only. On any other image the result was false positive.
Now I went for passing up Hog-features detected from every training image into the training. The results have improved a lot.But still there is a bigger scope of improvement. Now the SVM detector when passed through this trained cascade detects object in images other than training images BUT only of the same format of the image.
Please help me up in finding a robust solution to use this SVM to detect objects. I am posting the code for training here below, please help me in modifying the parameters so that the training goes robust.
Thank you in advance :)
CODE:
#include "cv.h"
#include "highgui.h"
#include "ml.h"
#include <stdio.h>
#include <iostream>
#include <opencv2/features2d/features2d.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <vector>
#include <sstream>
#include <string>
#include <cstring>
#include <stdlib.h>
using namespace cv;
using namespace std;
void reverse(char str[], int length)
{
int start = 0;
int end = length -1;
while (start < end)
{
swap(*(str+start), *(str+end));
start++;
end--;
}
}
// Implementation of itoa()
char* itoa(int num, char* str, int base)
{
int i = 0;
bool isNegative = false;
/* Handle 0 explicitely, otherwise empty string is printed for 0 */
if (num == 0)
{
str[i++] = '0';
str[i] = '\0';
return str;
}
// In standard itoa(), negative numbers are handled only with
// base 10. Otherwise numbers are considered unsigned.
if (num < 0 && base == 10)
{
isNegative = true;
num = -num;
}
// Process individual digits
while (num != 0)
{
int rem = num % base;
str[i++] = (rem > 9)? (rem-10) + 'a' : rem + '0';
num = num/base;
}
// If number is negative, append '-'
if (isNegative)
str[i++] = '-';
str[i] = '\0'; // Append string terminator
// Reverse the string
reverse(str, i);
return str;
}
int main()
{
//variables
char FirstFileName[100]="train/"; //the location of the training images.Both positives and negatives must be in same folder
char lastname[100] = ".JPG"; //type of the images
float data[1000][3];
int FileNum=5781; //Number of images(Positives + negatives)
vector< vector < float> > v_descriptorsValues;
vector< vector < Point> > v_locations;
Mat Hogfeat;
float labelsMat[5781]; //Length of array must be equal to number of training images
int img_area = 1*61236; //61236 is the number of features detected. Make sure you mention the number of features detected.
Mat training_mat(FileNum,img_area,CV_32FC1);
// Mat training_mat(4, 2, CV_32FC1, trainingData);
for(int i=1; i<=FileNum; i++)
{
char FullFileName[100] = "";
char number[100] = "";
strcat(FullFileName,FirstFileName);
itoa(i,number,10);
strcat(FullFileName,number);
strcat(FullFileName,lastname);
//read ...
It is normal that an SVM only works on a single scale. What you manually have to do is built an image pyramid and then perform a detection with your classifier on each layer. Keep in mind that the scale of your model restricts the smallest object that can be found, by downscaling larger images are being detected without problems. Upscaling however introduces many artefacts that ensure failure of SVM detections.
Some usefull links:
http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=pyrdown#cv.PyrDown
http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=pyrup#void%20pyrUp%28InputArray%20src,%20OutputArray%20dst,%20const%20Size&%20dstsize,%20int%20borderType%29
What is mostly done, is apply 1 - 2 upscales and then multiple downscales.