Ask Your Question

# How is Decision Tree Split Quality Computed?

How is Decision Tree Split Quality Computed?

edit retag close merge delete

## 1 answer

Sort by » oldest newest most voted

OpenCV decision tree is the CART version of the tree. Regression and classification tasks use own splitting criteria.

1. Regression task. We minimize the expected sum variances of responses for two child nodes. In other words for each split canidate in a splitted node we estimate responses for all samples in the node. I.e. each sample is assigned to left or right node's response that computed as average sum of true responses of samples that came to the current child node. Then we compute "sum((true_response - predicted_response)^2)" for all samples of the splitted node for a given split and choose the split that gives the minimum of this sum.
2. Classification task. Its criteria is based on Gini index and minimizes impurity of child nodes (samples of one class should belong to the same child node). It's hard to describe Gini index in simple words here, so see e.g. 'Gini impurity' http://en.wikipedia.org/wiki/Decisiontreelearning#Gini_impurity). We minimize a sum "left_node_samples_ratio * GiniIndex(left_node_samples) + right_node_samples_ratio * GiniIndex(right_node_samples)" to find the best split.

Minimization of described criterias is reduced to equivalent maximization of another simplified ones - it's a warning if you'll decide to look at the implementation ;)

more

Official site

GitHub

Wiki

Documentation

## Stats

Asked: 2012-07-20 09:35:16 -0500

Seen: 6,416 times

Last updated: Jul 20 '12