cv::ml::StatModel::calcError not working for responses of type CV_32S
I am using the master
branch from the repository (hash:361eb633f6e841bcda18f970193fc4fb439bc4c8) . I have a feature vector consisting of several ordered variables. My responses on the other hand are categorical and of type CV_32S. I now want to create a RTrees
for this problem. The documentation of TrainData::create()
states that it is possible to have train data of type CV_32S
:
responses – matrix of responses. If the responses are scalar, they should be stored as a single row or as a single column. The matrix should have type CV_32F or CV_32S (in the former case the responses are considered as ordered by default; in the latter case - as categorical)
In the documentation of RTrees
I can't find a reason for this to be illegal.
However if I train my RTrees
as follows:
#include <iostream>
#include <random>
#include <opencv2/core.hpp>
#include <opencv2/ml.hpp>
using namespace std;
using namespace cv;
using namespace cv::ml;
int main()
{
random_device rd;
mt19937 gen( rd() );
uniform_real_distribution<> dis( 0, 1 );
uniform_int_distribution<> dis1( 0, 1 );
int samples = 100;
Mat_<float> train( samples, 3 );
for ( auto & x : train ) { x = dis( gen ); }
// CASE #1
//Mat_<int> resp( samples, 1 );
//for ( auto & x : resp ) { x = dis1( gen ); }
// CASE #2
Mat resp( samples, 1, CV_32S );
for ( auto it = resp.begin<int>(); it != resp.end<int>(); ++it ) { *it = dis1( gen );}
// CASE #3
//Mat_<float> resp( samples, 1 );
//for ( auto & x : resp ) { x = dis1( gen ); }
Mat_<char> types( train.cols + 1, 1 );
types.setTo( cv::Scalar( VAR_ORDERED ) );
types( train.cols, 0 ) = VAR_CATEGORICAL;
Ptr<TrainData> tdata = TrainData::create( train, ROW_SAMPLE, resp, noArray(), noArray(), noArray(), types );
Ptr<RTrees> rf = RTrees::create();
rf->train( tdata );
Mat_<float> calc_out;
cout << "calc error: " << rf->calcError( tdata, false, noArray() ) << endl;
Mat_<float> pred_out;
rf->predict( tdata->getTrainSamples(), pred_out );
int missclass = 0;
for ( int i = 0; i < pred_out.rows; ++i )
{
Mat_<float> r = tdata->getTrainResponses();
if ( pred_out( i, 0 ) != r( i, 0 ) )
{
missclass++;
}
}
cout << "pred error: " << missclass / ( float )samples << endl;
return 0;
}
A Gist of this can also be found here: LINK
In Case #1 and Case #2 the output is something like the following:
calc error: 46
pred error: 0.17
Only for *Case #3" the error is computed correctly:
calc error: 24
pred error: 0.24
Question #1
Is this behavior desired? If so, maybe this should be clarified in the documentation of StatModel
, RTrees
or TrainData
?
The problem seems to be in this part of the StatModel::calcError()
method:
...
float val = predict(sample);
float val0 = responses.at<float>(si);
if( isclassifier )
err += fabs(val - val0) > FLT_EPSILON;
...
If responses
is of type int
this would lead to a different val0
then expected?
I think this could be fixed by checking the type of responses
and switching between at<float>
and at<int>
?
Question #2
I was quite confused, that calcError
returns a result between 0 <= x <= 100, although the return type is float
. In my opinion a return value between 0 <= x <= 1 would be more appropriate. What do you think?
Conclusion Should this be posted to ...