I need to train a decision tree that completely fits my data. I _want_ it to over-fit. Thus, I don't want it to be pruned, and I want it to grow the tree until every leaf has samples with only one label. Mine is a classification task, with two labels. Here are the params I used:
CvDTreeParams params;
params.min_sample_count = -1;
params.regression_accuracy = 0;
params.use_surrogates = false;
params.truncate_pruned_tree = false;
params.cv_folds = 0;
params.use_1se_rule = false;
And here is how I'm training:
cv::Mat trainData(numSamples, dim, CV_32FC1);
cv::Mat trainLabels(numSamples, 1, CV_32SC1);
// ...
CvDTree* dtree = new CvDTree();
cv::Mat var_type(newDim + 1, 1, CV_8U);
// all inputs are numerical
var_type.setTo(cv::Scalar(CV_VAR_NUMERICAL) );
// output is categorical
var_type.at<uchar>(newDim, 0) = CV_VAR_CATEGORICAL;
dtree->train(trainData, CV_ROW_SAMPLE, trainLabels,
cv::Mat(), cv::Mat(), var_type, cv::Mat(), params);
Unfortunately, for some benchmarks, the tree that is trained does not classify all training points correctly. How can I enforce this?