next up previous contents
Next: Dangers in Data-mining Up: Decision Trees Previous: Algorithm and Decision Tree

Structure finding:

The purpose in supervised classification is to find variables tightly connected to the classification variable. For each node in the decision tree it needs to be decided which variables and values (candidates) are most appropriate for this class-separation.

In this sense classification can be seen as a directed structure-finding problem related to one variable, the classification variable. But if we can find structure relative to one variable we can use this for finding structures involving other variables. In a user-guided approach these algorithms could furnish the user with knowledge about the closest classification-oriented variables for any selected variable. Queries like ``what variable is most correlated with this variable ?'' could be answered with flat decision trees. In this way decision trees could be also used as unsupervised methods.

Comparing TDIDT with the basic techniques it is obvious that projecting and subseting are used extensively. For finding rules and conditions to separate the classes we use projections and calculate entropies and conditional entropies on them. Iterating this process with subnodes we continuously subset the data and specialize the search.



Thomas Prang
1998-06-07