Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

How to train DTree until it completely separates data?

I need to train a decision tree that completely fits my data. I _want_ it to over-fit. Thus, I don't want it to be pruned, and I want it to grow the tree until every leaf has samples with only one label. Mine is a classification task, with two labels. Here are the params I used:

  CvDTreeParams params;
  params.min_sample_count = -1;
  params.regression_accuracy = 0;
  params.use_surrogates = false;
  params.truncate_pruned_tree = false;
  params.cv_folds = 0;
  params.use_1se_rule = false;

And here is how I'm training:

  cv::Mat trainData(numSamples, dim, CV_32FC1);
  cv::Mat trainLabels(numSamples, 1, CV_32SC1); 

  // ...

  CvDTree* dtree = new CvDTree();

  cv::Mat var_type(newDim + 1, 1, CV_8U);
  // all inputs are numerical                                                                                                                                             
  var_type.setTo(cv::Scalar(CV_VAR_NUMERICAL) );
  // output is categorical                                                                                                                                                
  var_type.at<uchar>(newDim, 0) = CV_VAR_CATEGORICAL;

  dtree->train(trainData, CV_ROW_SAMPLE, trainLabels,
              cv::Mat(), cv::Mat(), var_type, cv::Mat(), params);

Unfortunately, for some benchmarks, the tree that is trained does not classify all training points correctly. How can I enforce this?