Algorithm and Decision Tree Measures:

Next: Structure finding: Up: Decision Trees Previous: Rule-Selection:

Algorithm and Decision Tree Measures:

Similar to reconstructability analysis (section 3.3.2), in decision tree methods we need to deal with tradeoffs between accuracy and complexity. A decision tree can be very complex but with its accuracy also overfitting the training data. On the other hand a decision tree can be very simple, but inaccurate. Several measures for comparing different algorithms and resulting decision trees have been introduced:

1.: ``accuracy'' := the percentage of correct classifications on a cross-validation data set.
2.: ``complexity'' := total number of leaves on the tree.
3.: ``efficiency'' := average depth of the tree (top node to leaf). This describes the average cost for using the decision tree for classification.
4.: ``practicality'' := time spend for tree building, pruning and cross-validation.

Another important issue to be considered is the size of the training data at each decision node. ``A fundamental principle of inference is that the degree of confidence with which one is able to choose is directly related to the number of examples'' [32, pg. 258]. Therefore inferences made near the leaves of a TDIDT decision tree tend to be (statistically) less significant and reliable than those made near the root. This problem is closely connected with overfitting the training data.

The interpretation of decision trees is also difficult. Though understanding the classification reasons for a decision tree is not as difficult as for MLP neural networks, the captured structures are often not obvious to human users.

Next: Structure finding: Up: Decision Trees Previous: Rule-Selection:

Thomas Prang
1998-06-07