Revision history [back]

SVM predict on OpenCV: how can I extract the same number of features

I am play with OpenCV and SVM to make a classifier to predict facial expression. I have no problem to classify test dataset, but when I try to predict a new image, I get this:

OpenCV Error: Assertion failed (samples.cols == var_count && samples.type() == CV_32F) in cv::ml::SVMImpl::predict

Error is pretty clear and I have a different number of columns, but of the same type. I do not know how to achieve that, because I have a matrix of dimensions 1xnumber_of_features, but numbers_of_features is not the same of the trained and tested samples. How can I extract the same number of features from another image? Am I missing something?

To train classifier I did:

Detect face and save ROI;
Sift to extract features;
kmeans to cluster them;
bag of words to get the same numbers of features for each image;
pca to reduce;
train on train dadaset;
predict on test dadaset;

On the new image I did the same thing.

I tried to resize the new image to the same size, but nothing, same error ( and different number of columns, aka features). Vectors are of the same type (CF_32F).

After succesfuly trained my classifier, I save SVM model in this way

svmClassifier->save(baseDatabasePath);

Then I load it when I need to do real time prediction in this way

cv::Ptr<cv::ml::SVM> svmClassifier;
svmClassifier = cv::ml::StatModel::load<ml::SVM>(path);

Then loop,

while (true) 
{
    getOneImage();
    cv::Mat feature = extractFeaturesFromSingleImage();
    float labelPredicted = svmClassifier->predict(feature);
    cout << "Label predicted is: " << labelPredicted << endl;
}

But predict returns the error. feature dimension is 1x66, for example. As you can see below, I need like 140 features

<?xml version="1.0"?>
<opencv_storage>
<opencv_ml_svm>
  <format>3</format>
  <svmType>C_SVC</svmType>
  <kernel>
    <type>RBF</type>
    <gamma>5.0625000000000009e-01</gamma></kernel>
  <C>1.2500000000000000e+01</C>
  <term_criteria><epsilon>1.1920928955078125e-07</epsilon>
    <iterations>1000</iterations></term_criteria>
  <var_count>140</var_count>
  <class_count>7</class_count>
  <class_labels type_id="opencv-matrix">
    <rows>7</rows>
    <cols>1</cols>
    <dt>i</dt>
    <data>
      0 1 2 3 4 5 6</data></class_labels>
  <sv_total>172</sv_total>

<support_vectors>

I do not know how achieve 140 features, when SIFT, FAST or SURF just give me around 60 features. What am I missing? How can I put my real time sample on the same dimension of train and test dataset?

Some code. Extract features with sift and push on a vector of mat.

std::vector<cv::Mat> featuresVector;
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat face = cv::imread(facePath, CV_LOAD_IMAGE_GRAYSCALE);
    cv::Mat featuresExtracted = runExtractFeature(face, featuresExtractionAlgorithm);
    featuresVector.push_back(featuresExtracted);
}

Get total features extracted from all images.

int numberFeatures = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    numberFeatures += featuresVector[i].rows;
}

Prepare a mat to cluster features (I tried to follow this example)

cv::Mat featuresData = cv::Mat::zeros(numberFeatures, featuresVector[0].cols, CV_32FC1);
int currentIndex = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    featuresVector[i].copyTo(featuresData.rowRange(currentIndex, currentIndex + featuresVector[i].rows));
    currentIndex += featuresVector[i].rows;
}

Perform clustering (I do not know how this parameter suite my case, my I think can be ok for now)

cv::Mat labels;
cv::Mat centers;
int binSize = 1000;
kmeans(featuresData, binSize, labels, cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 100, 1.0), 3, KMEANS_PP_CENTERS, centers);

Prepare a mat to perform bow.

cv::Mat featuresDataHist = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
    int numberImageFeatures = featuresVector[i].rows;
    for (int j = 0; j < numberImageFeatures; ++j)
    {
        int bin = labels.at<int>(currentIndex + j);
        feature.at<float>(0, bin) += 1;
    }
    cv::normalize(feature, feature);
    feature.copyTo(featuresDataHist.row(i));
    currentIndex += featuresVector[i].rows;
}

PCA to try to reduce dimension.

cv::PCA pca(featuresDataHist, cv::Mat(), CV_PCA_DATA_AS_ROW, 50/*0.90*/);
cv::Mat feature;
for (int i = 0; i < numberImages; ++i) 
{
    feature = pca.project(featuresDataHist.row(i));
}

I already try to found support and someone point me "why you wanna 140 features, just take 50, for example".

To summerize:

How can I extract the same number of features for a sample test? Am I missing something?
I need to extract 140 features, for example, or can I chose less features? If yes, how chose how many?
Do I failed (refer to code) in this way?

SVM predict on OpenCV: how can I extract the same number of features

I am play with OpenCV and SVM to make a classifier to predict facial expression. I have no problem to classify test dataset, but when I try to predict a new image, I get this:

OpenCV Error: Assertion failed (samples.cols == var_count && samples.type() == CV_32F) in cv::ml::SVMImpl::predict

To train classifier I did:

Detect face and save ROI;
Sift to extract features;
kmeans to cluster them;
bag of words to get the same numbers of features for each image;
pca to reduce;
train on train dadaset;
predict on test dadaset;

On the new image I did the same thing.

I tried to resize the new image to the same size, but nothing, same error ( and different number of columns, aka features). Vectors are of the same type (CF_32F).

After succesfuly trained my classifier, I save SVM model in this way

svmClassifier->save(baseDatabasePath);

Then I load it when I need to do real time prediction in this way

cv::Ptr<cv::ml::SVM> svmClassifier;
svmClassifier = cv::ml::StatModel::load<ml::SVM>(path);

Then loop,

while (true) 
{
    getOneImage();
    cv::Mat feature = extractFeaturesFromSingleImage();
    float labelPredicted = svmClassifier->predict(feature);
    cout << "Label predicted is: " << labelPredicted << endl;
}

But predict returns the error. feature dimension is 1x66, for example. As you can see below, I need like 140 features

<?xml version="1.0"?>
<opencv_storage>
<opencv_ml_svm>
  <format>3</format>
  <svmType>C_SVC</svmType>
  <kernel>
    <type>RBF</type>
    <gamma>5.0625000000000009e-01</gamma></kernel>
  <C>1.2500000000000000e+01</C>
  <term_criteria><epsilon>1.1920928955078125e-07</epsilon>
    <iterations>1000</iterations></term_criteria>
  <var_count>140</var_count>
  <class_count>7</class_count>
  <class_labels type_id="opencv-matrix">
    <rows>7</rows>
    <cols>1</cols>
    <dt>i</dt>
    <data>
      0 1 2 3 4 5 6</data></class_labels>
  <sv_total>172</sv_total>

<support_vectors>

I do not know how achieve 140 features, when SIFT, FAST or SURF just give me around 60 features. What am I missing? How can I put my real time sample on the same dimension of train and test dataset?

Some ~~code.~~ code.

As preprocessing (I try to extract some code, because there are more code wrapped).

cv::Mat image;
cv::Mat gray;
cv::Mat output;

image = cv::imread(imagePath[imageId], CV_LOAD_IMAGE_COLOR);
cv::cvtColor(image, gray, CV_BGR2GRAY);

double clipLimit = 4.0f;
Size tileGridSize(8, 8);
Ptr<CLAHE> clahe = cv::createCLAHE(2.0, tileGridSize);
clahe->apply(gray, output);

cv::CascadeClassifier faceCascade;
faceCascade.load(baseDatabasePath + "/" + cascadeDataName2);

std::vector<cv::Rect> faces;
faceCascade.detectMultiScale(output, faces, 1.2, 3, 0, cv::Size(50, 50));

int bestIndex = 0;
int maxWidth = 0;
for (unsigned int i = 0; i < faces.size(); ++i) 
{
    if (faces[i].width > maxWidth) 
    {
        bestIndex = i;
        maxWidth = faces[i].width;
    }
}

faceROI = output(faces[bestIndex]);
cv::resize(faceROI, faceROI, cv::Size(widthImageOutputResize, heightImageOutputResize));
imwrite(outputPath + "/" + currentFilename, faceROI);

Extract features with sift and push on a vector of mat.

std::vector<cv::Mat> featuresVector;
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat face = cv::imread(facePath, CV_LOAD_IMAGE_GRAYSCALE);
    cv::Mat featuresExtracted = runExtractFeature(face, featuresExtractionAlgorithm);
    featuresVector.push_back(featuresExtracted);
}

Get total features extracted from all images.

int numberFeatures = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    numberFeatures += featuresVector[i].rows;
}

Prepare a mat to cluster features (I tried to follow this example)

cv::Mat featuresData = cv::Mat::zeros(numberFeatures, featuresVector[0].cols, CV_32FC1);
int currentIndex = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    featuresVector[i].copyTo(featuresData.rowRange(currentIndex, currentIndex + featuresVector[i].rows));
    currentIndex += featuresVector[i].rows;
}

Perform clustering (I do not know how this parameter suite my case, my I think can be ok for now)

cv::Mat labels;
cv::Mat centers;
int binSize = 1000;
kmeans(featuresData, binSize, labels, cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 100, 1.0), 3, KMEANS_PP_CENTERS, centers);

Prepare a mat to perform bow.

cv::Mat featuresDataHist = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
    int numberImageFeatures = featuresVector[i].rows;
    for (int j = 0; j < numberImageFeatures; ++j)
    {
        int bin = labels.at<int>(currentIndex + j);
        feature.at<float>(0, bin) += 1;
    }
    cv::normalize(feature, feature);
    feature.copyTo(featuresDataHist.row(i));
    currentIndex += featuresVector[i].rows;
}

PCA to try to reduce dimension.

cv::PCA pca(featuresDataHist, cv::Mat(), CV_PCA_DATA_AS_ROW, 50/*0.90*/);
cv::Mat feature;
for (int i = 0; i < numberImages; ++i) 
{
    feature = pca.project(featuresDataHist.row(i));
}

I already try to found support and someone point me "why you wanna 140 features, just take 50, for example".

To summerize:

How can I extract the same number of features for a sample test? Am I missing something?
I need to extract 140 features, for example, or can I chose less features? If yes, how chose how many?
Do I failed (refer to code) in this way?

SVM predict on OpenCV: how can I extract the same number of features

I am play with OpenCV and SVM to make a classifier to predict facial expression. I have no problem to classify test dataset, but when I try to predict a new image, I get this:

OpenCV Error: Assertion failed (samples.cols == var_count && samples.type() == CV_32F) in cv::ml::SVMImpl::predict

To train classifier I did:

Detect face and save ROI;
Sift to extract features;
kmeans to cluster them;
bag of words to get the same numbers of features for each image;
pca to reduce;
train on train dadaset;
predict on test dadaset;

On the new image I did the same thing.

I tried to resize the new image to the same size, but nothing, same error ( and different number of columns, aka features). Vectors are of the same type (CF_32F).

After succesfuly trained my classifier, I save SVM model in this way

svmClassifier->save(baseDatabasePath);

Then I load it when I need to do real time prediction in this way

cv::Ptr<cv::ml::SVM> svmClassifier;
svmClassifier = cv::ml::StatModel::load<ml::SVM>(path);

Then loop,

while (true) 
{
    getOneImage();
    cv::Mat feature = extractFeaturesFromSingleImage();
    float labelPredicted = svmClassifier->predict(feature);
    cout << "Label predicted is: " << labelPredicted << endl;
}

But predict returns the error. feature dimension is 1x66, for example. As you can see below, I need like 140 features

<?xml version="1.0"?>
<opencv_storage>
<opencv_ml_svm>
  <format>3</format>
  <svmType>C_SVC</svmType>
  <kernel>
    <type>RBF</type>
    <gamma>5.0625000000000009e-01</gamma></kernel>
  <C>1.2500000000000000e+01</C>
  <term_criteria><epsilon>1.1920928955078125e-07</epsilon>
    <iterations>1000</iterations></term_criteria>
  <var_count>140</var_count>
  <class_count>7</class_count>
  <class_labels type_id="opencv-matrix">
    <rows>7</rows>
    <cols>1</cols>
    <dt>i</dt>
    <data>
      0 1 2 3 4 5 6</data></class_labels>
  <sv_total>172</sv_total>

<support_vectors>

I do not know how achieve 140 features, when SIFT, FAST or SURF just give me around 60 features. What am I missing? How can I put my real time sample on the same dimension of train and test dataset?

Some code.

As preprocessing (I try to extract some code, because there are more code wrapped).

cv::Mat image;
cv::Mat gray;
cv::Mat output;

image = cv::imread(imagePath[imageId], CV_LOAD_IMAGE_COLOR);
cv::cvtColor(image, gray, CV_BGR2GRAY);

double clipLimit = 4.0f;
Size tileGridSize(8, 8);
Ptr<CLAHE> clahe = cv::createCLAHE(2.0, tileGridSize);
clahe->apply(gray, output);

cv::CascadeClassifier faceCascade;
faceCascade.load(baseDatabasePath + "/" + cascadeDataName2);

std::vector<cv::Rect> faces;
faceCascade.detectMultiScale(output, faces, 1.2, 3, 0, cv::Size(50, 50));

int bestIndex = 0;
int maxWidth = 0;
for (unsigned int i = 0; i < faces.size(); ++i) 
{
    if (faces[i].width > maxWidth) 
    {
        bestIndex = i;
        maxWidth = faces[i].width;
    }
}

faceROI = output(faces[bestIndex]);
cv::resize(faceROI, faceROI, cv::Size(widthImageOutputResize, heightImageOutputResize));
imwrite(outputPath + "/" + currentFilename, faceROI);

Extract features with sift and push on a vector of mat.

std::vector<cv::Mat> featuresVector;
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat face = cv::imread(facePath, CV_LOAD_IMAGE_GRAYSCALE);
    cv::Mat featuresExtracted = runExtractFeature(face, featuresExtractionAlgorithm);
    featuresVector.push_back(featuresExtracted);
}

Get total features extracted from all images.

int numberFeatures = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    numberFeatures += featuresVector[i].rows;
}

Prepare a mat to cluster features (I tried to follow this example)

cv::Mat featuresData = cv::Mat::zeros(numberFeatures, featuresVector[0].cols, CV_32FC1);
int currentIndex = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    featuresVector[i].copyTo(featuresData.rowRange(currentIndex, currentIndex + featuresVector[i].rows));
    currentIndex += featuresVector[i].rows;
}

Perform clustering (I do not know how this parameter suite my case, my I think can be ok for now)

cv::Mat labels;
cv::Mat centers;
int binSize = 1000;
kmeans(featuresData, binSize, labels, cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 100, 1.0), 3, KMEANS_PP_CENTERS, centers);

Prepare a mat to perform bow.

cv::Mat featuresDataHist = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
    int numberImageFeatures = featuresVector[i].rows;
    for (int j = 0; j < numberImageFeatures; ++j)
    {
        int bin = labels.at<int>(currentIndex + j);
        feature.at<float>(0, bin) += 1;
    }
    cv::normalize(feature, feature);
    feature.copyTo(featuresDataHist.row(i));
    currentIndex += featuresVector[i].rows;
}

PCA to try to reduce dimension.

cv::PCA pca(featuresDataHist, cv::Mat(), CV_PCA_DATA_AS_ROW, 50/*0.90*/);
cv::Mat feature;
for (int i = 0; i < numberImages; ++i) 
{
    feature = pca.project(featuresDataHist.row(i));
}

I already try to found support and someone point me "why you wanna 140 features, just take 50, for example".

To summerize:

How can I extract the same number of features for a sample test? Am I missing something?
I need to extract 140 features, for example, or can I chose less features? If yes, how chose how many?
Do I failed (refer to code) in this way?

SVM predict on OpenCV: how can I extract the same number of features

I am play with OpenCV and SVM to make a classifier to predict facial expression. I have no problem to classify test dataset, but when I try to predict a new image, I get this:

OpenCV Error: Assertion failed (samples.cols == var_count && samples.type() == CV_32F) in cv::ml::SVMImpl::predict

To train classifier I did:

Detect face and save ROI;
Sift to extract features;
kmeans to cluster them;
bag of words to get the same numbers of features for each image;
pca to reduce;
train on train dadaset;
predict on test dadaset;

On the new image I did the same thing.

I tried to resize the new image to the same size, but nothing, same error ( and different number of columns, aka features). Vectors are of the same type (CF_32F).

After succesfuly trained my classifier, I save SVM model in this way

svmClassifier->save(baseDatabasePath);

Then I load it when I need to do real time prediction in this way

cv::Ptr<cv::ml::SVM> svmClassifier;
svmClassifier = cv::ml::StatModel::load<ml::SVM>(path);

Then loop,

while (true) 
{
    getOneImage();
    cv::Mat feature = extractFeaturesFromSingleImage();
    float labelPredicted = svmClassifier->predict(feature);
    cout << "Label predicted is: " << labelPredicted << endl;
}

But predict returns the error. feature dimension is 1x66, for example. As you can see below, I need like 140 features

<?xml version="1.0"?>
<opencv_storage>
<opencv_ml_svm>
  <format>3</format>
  <svmType>C_SVC</svmType>
  <kernel>
    <type>RBF</type>
    <gamma>5.0625000000000009e-01</gamma></kernel>
  <C>1.2500000000000000e+01</C>
  <term_criteria><epsilon>1.1920928955078125e-07</epsilon>
    <iterations>1000</iterations></term_criteria>
  <var_count>140</var_count>
  <class_count>7</class_count>
  <class_labels type_id="opencv-matrix">
    <rows>7</rows>
    <cols>1</cols>
    <dt>i</dt>
    <data>
      0 1 2 3 4 5 6</data></class_labels>
  <sv_total>172</sv_total>

<support_vectors>

I do not know how achieve 140 features, when SIFT, FAST or SURF just give me around 60 features. What am I missing? How can I put my real time sample on the same dimension of train and test dataset?

Some code.

As preprocessing (I try to extract some code, because there are more code wrapped).

cv::Mat image;
cv::Mat gray;
cv::Mat output;

image = cv::imread(imagePath[imageId], CV_LOAD_IMAGE_COLOR);
cv::cvtColor(image, gray, CV_BGR2GRAY);

double clipLimit = 4.0f;
Size tileGridSize(8, 8);
Ptr<CLAHE> clahe = cv::createCLAHE(2.0, tileGridSize);
clahe->apply(gray, output);

cv::CascadeClassifier faceCascade;
faceCascade.load(baseDatabasePath + "/" + cascadeDataName2);

std::vector<cv::Rect> faces;
faceCascade.detectMultiScale(output, faces, 1.2, 3, 0, cv::Size(50, 50));

int bestIndex = 0;
int maxWidth = 0;
for (unsigned int i = 0; i < faces.size(); ++i) 
{
    if (faces[i].width > maxWidth) 
    {
        bestIndex = i;
        maxWidth = faces[i].width;
    }
}

faceROI = output(faces[bestIndex]);
cv::resize(faceROI, faceROI, cv::Size(widthImageOutputResize, heightImageOutputResize));
imwrite(outputPath + "/" + currentFilename, faceROI);

Extract features with sift and push on a vector of mat.

std::vector<cv::Mat> featuresVector;
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat face = cv::imread(facePath, CV_LOAD_IMAGE_GRAYSCALE);
    cv::Mat featuresExtracted = runExtractFeature(face, featuresExtractionAlgorithm);
    featuresVector.push_back(featuresExtracted);
}

Get total features extracted from all images.

int numberFeatures = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    numberFeatures += featuresVector[i].rows;
}

Prepare a mat to cluster features (I tried to follow this example)

cv::Mat featuresData = cv::Mat::zeros(numberFeatures, featuresVector[0].cols, CV_32FC1);
int currentIndex = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    featuresVector[i].copyTo(featuresData.rowRange(currentIndex, currentIndex + featuresVector[i].rows));
    currentIndex += featuresVector[i].rows;
}

Perform clustering (I do not know how this parameter suite my case, my I think can be ok for now)

cv::Mat labels;
cv::Mat centers;
int binSize = 1000;
kmeans(featuresData, binSize, labels, cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 100, 1.0), 3, KMEANS_PP_CENTERS, centers);

Prepare a mat to perform bow.

cv::Mat featuresDataHist = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
    int numberImageFeatures = featuresVector[i].rows;
    for (int j = 0; j < numberImageFeatures; ++j)
    {
        int bin = labels.at<int>(currentIndex + j);
        feature.at<float>(0, bin) += 1;
    }
    cv::normalize(feature, feature);
    feature.copyTo(featuresDataHist.row(i));
    currentIndex += featuresVector[i].rows;
}

PCA to try to reduce dimension.

cv::PCA pca(featuresDataHist, cv::Mat(), CV_PCA_DATA_AS_ROW, 50/*0.90*/);
cv::Mat feature;
for (int i = 0; i < numberImages; ++i) 
{
    feature = pca.project(featuresDataHist.row(i));
}

I already try to found support and someone point me "why you wanna 140 features, just take 50, for example".

To summerize:

How can I extract the same number of features for a sample test? Am I missing something?
I need to extract 140 features, for example, or can I chose less features? If yes, how chose how many?
Do I failed (refer to code) in this way?

SVM predict on OpenCV: how can I extract the same number of features

I am play with OpenCV and SVM to make a classifier to predict facial expression. I have no problem to classify test dataset, but when I try to predict a new image, I get this:

OpenCV Error: Assertion failed (samples.cols == var_count && samples.type() == CV_32F) in cv::ml::SVMImpl::predict

To train classifier I did:

Detect face and save ROI;
Sift to extract features;
kmeans to cluster them;
bag of words to get the same numbers of features for each image;
pca to reduce;
train on train dadaset;
predict on test dadaset;

On the new image I did the same thing.

I tried to resize the new image to the same size, but nothing, same error ( and different number of columns, aka features). Vectors are of the same type (CF_32F).

After succesfuly trained my classifier, I save SVM model in this way

svmClassifier->save(baseDatabasePath);

Then I load it when I need to do real time prediction in this way

cv::Ptr<cv::ml::SVM> svmClassifier;
svmClassifier = cv::ml::StatModel::load<ml::SVM>(path);

Then loop,

while (true) 
{
    getOneImage();
    cv::Mat feature = extractFeaturesFromSingleImage();
    float labelPredicted = svmClassifier->predict(feature);
    cout << "Label predicted is: " << labelPredicted << endl;
}

But predict returns the error. feature dimension is 1x66, for example. As you can see below, I need like 140 features

<?xml version="1.0"?>
<opencv_storage>
<opencv_ml_svm>
  <format>3</format>
  <svmType>C_SVC</svmType>
  <kernel>
    <type>RBF</type>
    <gamma>5.0625000000000009e-01</gamma></kernel>
  <C>1.2500000000000000e+01</C>
  <term_criteria><epsilon>1.1920928955078125e-07</epsilon>
    <iterations>1000</iterations></term_criteria>
  <var_count>140</var_count>
  <class_count>7</class_count>
  <class_labels type_id="opencv-matrix">
    <rows>7</rows>
    <cols>1</cols>
    <dt>i</dt>
    <data>
      0 1 2 3 4 5 6</data></class_labels>
  <sv_total>172</sv_total>

<support_vectors>

I do not know how achieve 140 features, when SIFT, FAST or SURF just give me around 60 features. What am I missing? How can I put my real time sample on the same dimension of train and test dataset?

Some code.

As preprocessing (I try to extract some code, because there are more code wrapped).

cv::Mat image;
cv::Mat gray;
cv::Mat output;

image = cv::imread(imagePath[imageId], CV_LOAD_IMAGE_COLOR);
cv::cvtColor(image, gray, CV_BGR2GRAY);

double clipLimit = 4.0f;
Size tileGridSize(8, 8);
Ptr<CLAHE> clahe = cv::createCLAHE(2.0, tileGridSize);
clahe->apply(gray, output);

cv::CascadeClassifier faceCascade;
faceCascade.load(baseDatabasePath + "/" + cascadeDataName2);

std::vector<cv::Rect> faces;
faceCascade.detectMultiScale(output, faces, 1.2, 3, 0, cv::Size(50, 50));

int bestIndex = 0;
int maxWidth = 0;
for (unsigned int i = 0; i < faces.size(); ++i) 
{
    if (faces[i].width > maxWidth) 
    {
        bestIndex = i;
        maxWidth = faces[i].width;
    }
}

faceROI = output(faces[bestIndex]);
cv::resize(faceROI, faceROI, cv::Size(widthImageOutputResize, heightImageOutputResize));
imwrite(outputPath + "/" + currentFilename, faceROI);

Extract features with sift and push on a vector of mat.

std::vector<cv::Mat> featuresVector;
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat face = cv::imread(facePath, CV_LOAD_IMAGE_GRAYSCALE);
    cv::Mat featuresExtracted = runExtractFeature(face, featuresExtractionAlgorithm);
    featuresVector.push_back(featuresExtracted);
}

Get total features extracted from all images.

int numberFeatures = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    numberFeatures += featuresVector[i].rows;
}

Prepare a mat to cluster features (I tried to follow this example)

cv::Mat featuresData = cv::Mat::zeros(numberFeatures, featuresVector[0].cols, CV_32FC1);
int currentIndex = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    featuresVector[i].copyTo(featuresData.rowRange(currentIndex, currentIndex + featuresVector[i].rows));
    currentIndex += featuresVector[i].rows;
}

Perform clustering (I do not know how this parameter suite my case, my I think can be ok for now)

cv::Mat labels;
cv::Mat centers;
int binSize = 1000;
kmeans(featuresData, binSize, labels, cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 100, 1.0), 3, KMEANS_PP_CENTERS, centers);

Prepare a mat to perform bow.

cv::Mat featuresDataHist = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
    int numberImageFeatures = featuresVector[i].rows;
    for (int j = 0; j < numberImageFeatures; ++j)
    {
        int bin = labels.at<int>(currentIndex + j);
        feature.at<float>(0, bin) += 1;
    }
    cv::normalize(feature, feature);
    feature.copyTo(featuresDataHist.row(i));
    currentIndex += featuresVector[i].rows;
}

PCA to try to reduce dimension.

cv::PCA pca(featuresDataHist, cv::Mat(), CV_PCA_DATA_AS_ROW, 50/*0.90*/);
cv::Mat feature;
for (int i = 0; i < numberImages; ++i) 
{
    feature = pca.project(featuresDataHist.row(i));
}

I already try to found support and someone point me "why you wanna 140 features, just take 50, for example".

To summerize:

How can I extract the same number of features for a sample test? Am I missing something?
I need to extract 140 features, for example, or can I chose less features? If yes, how chose how many?
Do I failed (refer to code) in this way?

EDIT: As suggest from @berak I tried this way to make an histogram.

std::vector<std::vector<KeyPoint>> keypoints;
cv::Mat featuresDataOverBins = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
    std::vector<cv::KeyPoint>::iterator keypointsIT;
    for (keypointsIT = keypoints[i].begin(); keypointsIT != keypoints[i].end(); ++keypointsIT)
    {
        float minDistance = FLT_MAX;
        int indexBin = -1;
        for (int j = 0; j < binSize; ++j)
        {
            cv::Point2f point(keypointsIT->pt);
            cv::Point2f centerCluster(centers.at<float>(j, 0), centers.at<float>(j, 1));
            float distance = std::sqrt((point.x - centerCluster.x)*(point.x - centerCluster.x) + (point.y - centerCluster.y)*(point.y - centerCluster.y));
            if (distance < minDistance)
            {
                minDistance = distance;
                indexBin = j;
            }
        }
        feature.at<float>(0, indexBin) += 1;
    }
    cv::normalize(feature, feature);
    feature.copyTo(featuresDataOverBins.row(i));
}

Do you think is correct? To get the label I just read each images name, parse it and then store it into the same file.

By the way, following this approach I do not get good value of accuracy in prediction (not good as before with the testing dataset, from about 85% to 50%).

SVM predict on OpenCV: how can I extract the same number of features

I am play with OpenCV and SVM to make a classifier to predict facial expression. I have no problem to classify test dataset, but when I try to predict a new image, I get this:

OpenCV Error: Assertion failed (samples.cols == var_count && samples.type() == CV_32F) in cv::ml::SVMImpl::predict

To train classifier I did:

Detect face and save ROI;
Sift to extract features;
kmeans to cluster them;
bag of words to get the same numbers of features for each image;
pca to reduce;
train on train dadaset;
predict on test dadaset;

On the new image I did the same thing.

I tried to resize the new image to the same size, but nothing, same error ( and different number of columns, aka features). Vectors are of the same type (CF_32F).

After succesfuly trained my classifier, I save SVM model in this way

svmClassifier->save(baseDatabasePath);

Then I load it when I need to do real time prediction in this way

cv::Ptr<cv::ml::SVM> svmClassifier;
svmClassifier = cv::ml::StatModel::load<ml::SVM>(path);

Then loop,

while (true) 
{
    getOneImage();
    cv::Mat feature = extractFeaturesFromSingleImage();
    float labelPredicted = svmClassifier->predict(feature);
    cout << "Label predicted is: " << labelPredicted << endl;
}

But predict returns the error. feature dimension is 1x66, for example. As you can see below, I need like 140 features

<?xml version="1.0"?>
<opencv_storage>
<opencv_ml_svm>
  <format>3</format>
  <svmType>C_SVC</svmType>
  <kernel>
    <type>RBF</type>
    <gamma>5.0625000000000009e-01</gamma></kernel>
  <C>1.2500000000000000e+01</C>
  <term_criteria><epsilon>1.1920928955078125e-07</epsilon>
    <iterations>1000</iterations></term_criteria>
  <var_count>140</var_count>
  <class_count>7</class_count>
  <class_labels type_id="opencv-matrix">
    <rows>7</rows>
    <cols>1</cols>
    <dt>i</dt>
    <data>
      0 1 2 3 4 5 6</data></class_labels>
  <sv_total>172</sv_total>

<support_vectors>

I do not know how achieve 140 features, when SIFT, FAST or SURF just give me around 60 features. What am I missing? How can I put my real time sample on the same dimension of train and test dataset?

Some code.

As preprocessing (I try to extract some code, because there are more code wrapped).

cv::Mat image;
cv::Mat gray;
cv::Mat output;

image = cv::imread(imagePath[imageId], CV_LOAD_IMAGE_COLOR);
cv::cvtColor(image, gray, CV_BGR2GRAY);

double clipLimit = 4.0f;
Size tileGridSize(8, 8);
Ptr<CLAHE> clahe = cv::createCLAHE(2.0, tileGridSize);
clahe->apply(gray, output);

cv::CascadeClassifier faceCascade;
faceCascade.load(baseDatabasePath + "/" + cascadeDataName2);

std::vector<cv::Rect> faces;
faceCascade.detectMultiScale(output, faces, 1.2, 3, 0, cv::Size(50, 50));

int bestIndex = 0;
int maxWidth = 0;
for (unsigned int i = 0; i < faces.size(); ++i) 
{
    if (faces[i].width > maxWidth) 
    {
        bestIndex = i;
        maxWidth = faces[i].width;
    }
}

faceROI = output(faces[bestIndex]);
cv::resize(faceROI, faceROI, cv::Size(widthImageOutputResize, heightImageOutputResize));
imwrite(outputPath + "/" + currentFilename, faceROI);

Extract features with sift and push on a vector of mat.

std::vector<cv::Mat> featuresVector;
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat face = cv::imread(facePath, CV_LOAD_IMAGE_GRAYSCALE);
    cv::Mat featuresExtracted = runExtractFeature(face, featuresExtractionAlgorithm);
    featuresVector.push_back(featuresExtracted);
}

Get total features extracted from all images.

int numberFeatures = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    numberFeatures += featuresVector[i].rows;
}

Prepare a mat to cluster features (I tried to follow this example)

cv::Mat featuresData = cv::Mat::zeros(numberFeatures, featuresVector[0].cols, CV_32FC1);
int currentIndex = 0;
for (int i = 0; i < featuresVector.size(); ++i)
{
    featuresVector[i].copyTo(featuresData.rowRange(currentIndex, currentIndex + featuresVector[i].rows));
    currentIndex += featuresVector[i].rows;
}

Perform clustering (I do not know how this parameter suite my case, my I think can be ok for now)

cv::Mat labels;
cv::Mat centers;
int binSize = 1000;
kmeans(featuresData, binSize, labels, cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 100, 1.0), 3, KMEANS_PP_CENTERS, centers);

Prepare a mat to perform bow.

cv::Mat featuresDataHist = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
    int numberImageFeatures = featuresVector[i].rows;
    for (int j = 0; j < numberImageFeatures; ++j)
    {
        int bin = labels.at<int>(currentIndex + j);
        feature.at<float>(0, bin) += 1;
    }
    cv::normalize(feature, feature);
    feature.copyTo(featuresDataHist.row(i));
    currentIndex += featuresVector[i].rows;
}

PCA to try to reduce dimension.

cv::PCA pca(featuresDataHist, cv::Mat(), CV_PCA_DATA_AS_ROW, 50/*0.90*/);
cv::Mat feature;
for (int i = 0; i < numberImages; ++i) 
{
    feature = pca.project(featuresDataHist.row(i));
}

I already try to found support and someone point me "why you wanna 140 features, just take 50, for example".

To summerize:

How can I extract the same number of features for a sample test? Am I missing something?
I need to extract 140 features, for example, or can I chose less features? If yes, how chose how many?
Do I failed (refer to code) in this way?

~~EDIT:~~ EDIT: As suggest from @berak I tried this way to make an histogram.

std::vector<std::vector<KeyPoint>> keypoints;
...
get keypoints
...
cv::Mat featuresDataOverBins featuresDataHist = cv::Mat::zeros(numberImages, binSize, CV_32FC1);
for (int i = 0; i < numberImages; ++i)
{
    cv::Mat feature = cv::Mat::zeros(1, binSize, CV_32FC1);
    std::vector<cv::KeyPoint>::iterator keypointsIT;
    for (keypointsIT = keypoints[i].begin(); keypointsIT != keypoints[i].end(); ++keypointsIT)
    {
        float minDistance = FLT_MAX;
        int indexBin = -1;
        for (int j = 0; j < binSize; ++j)
        {
            cv::Point2f point(keypointsIT->pt);
            cv::Point2f centerCluster(centers.at<float>(j, 0), centers.at<float>(j, 1));
            float distance = std::sqrt((point.x - centerCluster.x)*(point.x - centerCluster.x) + (point.y - centerCluster.y)*(point.y - centerCluster.y));
            if (distance < minDistance)
            {
                minDistance = distance;
                indexBin = j;
            }
        }
        feature.at<float>(0, indexBin) += 1;
    }
    cv::normalize(feature, feature);
    feature.copyTo(featuresDataOverBins.row(i));
feature.copyTo(featuresDataHist.row(i));
}

Do you think is correct? To get the label I just read each images name, parse it and then store it into the same file.

By the way, following this approach I do not get good value of accuracy in prediction (not good as before with the testing dataset, from about 85% to 50%).