Next: Structure finding:
Up: Decision Trees
Previous: Rule-Selection:
Similar to reconstructability analysis (section 3.3.2), in decision tree methods
we need to deal with
tradeoffs between accuracy and complexity. A decision tree
can be very complex but with its accuracy also overfitting the training data.
On the other hand a decision tree can be very simple, but inaccurate.
Several measures for comparing different algorithms and resulting decision trees
have been introduced:
- 1.
- ``accuracy'' := the percentage of correct
classifications on a cross-validation data set.
- 2.
- ``complexity'' := total number of leaves on the tree.
- 3.
- ``efficiency'' := average depth of the tree (top node to leaf).
This describes the average cost for using the decision tree for classification.
- 4.
- ``practicality'' := time spend for tree building, pruning and cross-validation.
Another important issue to be considered is the size of the
training data at each decision node.
``A fundamental principle of inference is that the degree of confidence with which one
is able to choose is directly related to the number of examples''
[32, pg. 258]. Therefore inferences made near the leaves of
a TDIDT decision tree tend to be (statistically) less significant and reliable than
those made near the root. This problem is closely connected with overfitting
the training data.
The interpretation of decision trees is also difficult. Though understanding
the classification reasons for a decision tree is not as difficult as
for MLP neural networks,
the captured structures are often not obvious to human users.
Next: Structure finding:
Up: Decision Trees
Previous: Rule-Selection:
Thomas Prang
1998-06-07