How to train DTree until it completely separates data?

asked 2015-04-11 14:48:33 -0600

psquid
1 ●1

I need to train a decision tree that completely fits my data. I _want_ it to over-fit. Thus, I don't want it to be pruned, and I want it to grow the tree until every leaf has samples with only one label. Mine is a classification task, with two labels. Here are the params I used:

  CvDTreeParams params;
  params.min_sample_count = -1;
  params.regression_accuracy = 0;
  params.use_surrogates = false;
  params.truncate_pruned_tree = false;
  params.cv_folds = 0;
  params.use_1se_rule = false;

And here is how I'm training:

  cv::Mat trainData(numSamples, dim, CV_32FC1);
  cv::Mat trainLabels(numSamples, 1, CV_32SC1); 

  // ...

  CvDTree* dtree = new CvDTree();

  cv::Mat var_type(newDim + 1, 1, CV_8U);
  // all inputs are numerical                                                                                                                                             
  var_type.setTo(cv::Scalar(CV_VAR_NUMERICAL) );
  // output is categorical                                                                                                                                                
  var_type.at<uchar>(newDim, 0) = CV_VAR_CATEGORICAL;

  dtree->train(trainData, CV_ROW_SAMPLE, trainLabels,
              cv::Mat(), cv::Mat(), var_type, cv::Mat(), params);

Unfortunately, for some benchmarks, the tree that is trained does not classify all training points correctly. How can I enforce this?

edit retag flag offensive close merge delete

add a comment

How to train DTree until it completely separates data?

Links

Question Tools

Stats

Related questions

How to train DTree until it completely separates data? edit

Links

Question Tools

Stats

Related questions

How to train DTree until it completely separates data?