SVM predict on OpenCV: how can I extract the same number of features
I am play with OpenCV and SVM to make a classifier to predict facial expression. I have no problem to classify test dataset, but when I try to predict a new image, I get this:
OpenCV Error: Assertion failed (samples.cols == var_count && samples.type() == CV_32F) in cv::ml::SVMImpl::predict
Error is pretty clear and I have a different number of columns, but of the same type. I do not know how to achieve that, because I have a matrix of dimensions 1xnumber_of_features, but numbers_of_features is not the same of the trained and tested samples. How can I extract the same number of features from another image? Am I missing something?
To train classifier I did:
- Detect face and save ROI;
- Sift to extract features;
- kmeans to cluster them;
- bag of words to get the same numbers of features for each image;
- pca to reduce;
- train on train dadaset;
- predict on test dadaset;
On the new image I did the same thing.
I tried to resize the new image to the same size, but nothing, same error ( and different number of columns, aka features). Vectors are of the same type (CF_32F).
After succesfuly trained my classifier, I save SVM model in this way
svmClassifier->save(baseDatabasePath);
Then I load it when I need to do real time prediction in this way
cv::Ptr<cv::ml::SVM> svmClassifier;
svmClassifier = cv::ml::StatModel::load<ml::SVM>(path);
Then loop,
while (true)
{
getOneImage();
cv::Mat feature = extractFeaturesFromSingleImage();
float labelPredicted = svmClassifier->predict(feature);
cout << "Label predicted is: " << labelPredicted << endl;
}
But predict
returns the error. feature dimension is 1x66, for example. As you can see below, I need like 140 features
<?xml version="1.0"?>
<opencv_storage>
<opencv_ml_svm>
<format>3</format>
<svmType>C_SVC</svmType>
<kernel>
<type>RBF</type>
<gamma>5.0625000000000009e-01</gamma></kernel>
<C>1.2500000000000000e+01</C>
<term_criteria><epsilon>1.1920928955078125e-07</epsilon>
<iterations>1000</iterations></term_criteria>
<var_count>140</var_count>
<class_count>7</class_count>
<class_labels type_id="opencv-matrix">
<rows>7</rows>
<cols>1</cols>
<dt>i</dt>
<data>
0 1 2 3 4 5 6</data></class_labels>
<sv_total>172</sv_total>
<support_vectors>
I do not know how achieve 140 features, when SIFT, FAST or SURF just give me around 60 features. What am I missing? How can I put my real time sample on the same dimension of train and test dataset?
Some code.
As preprocessing (I try to extract some code, because there are more code wrapped).
cv::Mat image;
cv::Mat gray;
cv::Mat output;
image = cv::imread(imagePath[imageId], CV_LOAD_IMAGE_COLOR);
cv::cvtColor(image, gray, CV_BGR2GRAY);
double clipLimit = 4.0f;
Size tileGridSize(8, 8);
Ptr<CLAHE> clahe = cv::createCLAHE(2.0, tileGridSize);
clahe->apply(gray, output);
cv::CascadeClassifier faceCascade;
faceCascade.load(baseDatabasePath + "/" + cascadeDataName2);
std::vector<cv::Rect> faces;
faceCascade.detectMultiScale(output, faces, 1.2, 3, 0, cv::Size(50, 50));
int bestIndex = 0;
int maxWidth = 0;
for (unsigned int i = 0; i < faces.size(); ++i)
{
if (faces[i].width > maxWidth)
{
bestIndex = i;
maxWidth = faces[i].width;
}
}
faceROI = output(faces[bestIndex]);
cv::resize(faceROI, faceROI, cv::Size(widthImageOutputResize, heightImageOutputResize));
imwrite(outputPath + "/" + currentFilename, faceROI);
Extract features with sift and push on ...
your analysis is correct, you need exactly the same number of features(cols) for training and testing.
it's nice, that you split up your code into pieces, and try to explain the steps, but there are some parts missing. could you put the whole code on a gist or the like ?
@berak Hello, thanks for you answer. What is actually missing? Are you refering to training and testing?
yes, the proprocessing for your testing seems to be missing.
Hello @berak,
feature.at<float>(0, bin) += 1;
Anyway I am going to add some details.
hmm, for testing, you have to get feature "bins", a histogram from your bow dictionary, too, and that should be the input to svm->predict(). again a feature vector with nbins elements.
i don't see this anywhere in your code, can it be, you're just not doing it ?
(would explain, why the sizes don't match)
btw: https://gilscvblog.com/2013/08/23/bag...
If with "testing" you are refering to unseen image, I did the same thing (not posted here, I am bit shy to discover what I wrote there eheh could be terrible). Anyway, I think we arrived: on the unseen sample I extract feature and I get around 66 keypointsX128 (with sift), but now? How can I clusterize it with same number of bins (for example 1000)? Moreover, do I need to use centers computed before? I do not know if I able to write what is my problem.
i don't think, you can use the labels from kmeans for this. as you need to use exactly the same algo for train & test. calculating all those distances may seem excessive, but i don't see any way to avoid it.
3: no, same receipe for all test/train/unseen, whatever
4: my bad, missed it.
5: maybe. (there's always a better way...) for now, just try like this. also: there's BowKmeansTrainer and such, just saying ...