Ask Your Question
0

How is Decision Tree Split Quality Computed?

asked 2012-07-20 09:35:16 -0600

mjburlick gravatar image

How is Decision Tree Split Quality Computed?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
2

answered 2012-07-20 17:07:48 -0600

Maria Dimashova gravatar image

OpenCV decision tree is the CART version of the tree. Regression and classification tasks use own splitting criteria.

  1. Regression task. We minimize the expected sum variances of responses for two child nodes. In other words for each split canidate in a splitted node we estimate responses for all samples in the node. I.e. each sample is assigned to left or right node's response that computed as average sum of true responses of samples that came to the current child node. Then we compute "sum((true_response - predicted_response)^2)" for all samples of the splitted node for a given split and choose the split that gives the minimum of this sum.
  2. Classification task. Its criteria is based on Gini index and minimizes impurity of child nodes (samples of one class should belong to the same child node). It's hard to describe Gini index in simple words here, so see e.g. 'Gini impurity' http://en.wikipedia.org/wiki/Decisiontreelearning#Gini_impurity). We minimize a sum "left_node_samples_ratio * GiniIndex(left_node_samples) + right_node_samples_ratio * GiniIndex(right_node_samples)" to find the best split.

Minimization of described criterias is reduced to equivalent maximization of another simplified ones - it's a warning if you'll decide to look at the implementation ;)

edit flag offensive delete link more

Question Tools

Stats

Asked: 2012-07-20 09:35:16 -0600

Seen: 6,511 times

Last updated: Jul 20 '12