Supervised, Scalar methods

Next: Fisher / Perceptron (NN) Up: Methods Previous: Methods

Supervised, Scalar methods

In this section we will see that supervised methods on continuous data are only somewhat connected to our four basic techniques, mostly just by creating new dimensions. Here I want to discuss what makes these methods different so the reader might concentrate on these differences in the following method introductions.

Supervised methods have the purpose of classification and prediction: known correct classified data is used to derive or train a model which is then tested on different data for classification. Though we also try to find predictive patterns with other methods we don't just look for a model connecting some input variable with an classification-output and use a training-set for ``supervising'' the model. Some problems that can arise with supervised methods will be discussed in section 4. On the other hand supervised methods have the advantage that you normally know more precisely what you are looking for. You have data and classifications and the aim is to find an adequate mapping which supports the structure of the problem but avoids overfitting the training-set.

Another big difference is that all the presented methods make use of the ordering of the variables. Several variables are aggregated by the proposed model which transforms the input-variables into an output. This model is often a simple (continuous) function (i.e. linear), or consists of an iterative connecting of inputs (i.e. Neural Networks). The output (the new dimension) is either direct a classification, a value used for classification (i.e. Fisher) or some other ``useful'' information (i.e. log-odds-ratio in logistic Regression).

By linking the variables in a continuous function with its result used for classification we create a ``decision surface'' in the input variable space. The purpose of the described methods is to adjust the proposed model parameters such that this surface reflects the actual classes as accurately as possible. In the simplest case (presented in the first section with Fisher's method and the Perceptron) this surface is just a hyperplane which separates the two possible classifications.

Note that all these methods use an implicit assumption for the decision-surface to work: entities ``close'' to each other according to the ordering of the dimensions are assumed to be also ``close'' in their classification. In a ``chaotic'' region (i.e. the classification is very sensitive to small variations in the input-variables) these methods (but probably also all other methods) are unable to find correct classifications. This is one of the reasons for weather forecasts to be so uncertain in some specific (but surely not all) situations.

This whole concept of a ``decision-surface'' does not make sense if we have nominal (i.e. not ordered) data. A ``surface'' always assumes that in any dimension data-points on one side are ``larger'' in value and on the other side ``smaller'' in value. It is important to understand that this is not the case in nominal data. Weighted combination of variables are only possible in the sense of logic functions. So in the case of nominal data we only deal with decision (hyper-) points, or planes in the sense of projected data in which some variables are ignored as irrelevant.

This should be enough for a brief discussion before presenting some basic supervised methods on ordered data. The issues of nominal data, finding decision hyper-points and how nominal variables can be combined in logic-functions will be extended in the following sections.

Next: Fisher / Perceptron (NN) Up: Methods Previous: Methods

Thomas Prang
1998-06-07