Mask Analysis

Next: Unsupervised, Nominal methods Up: Unsupervised, Ordered Methods Previous: Discussion:

Mask Analysis

The aim of this method is the investigation of predictive behavior patterns over an ordered (sometimes also partially ordered) support, most often time. For example, with knowledge of the weather yesterday and today can we predict the weather tomorrow? Mask analysis essentially tries to model the behavior of discrete variables and even nominal variables in a similar way as differential equations model space and time behavior of continuous variables [5]. Examples of famous (partial) differential equations are the Wave-equation (hyperbolic PDE) and the Heat-equation (parabolic PDE) which are used to model many time, space, and functional relationships.

Whereas differential equations relate the derivatives over some support dimensions (time, or space derivatives) to the current function output (variable values), mask analysis looks at ``earlier'' support-instances of its variables (previous data entries). Comparing numerical methods for differential equations to mask analysis is even more striking. These numerical methods also use the previous time instance for first order differential equations, the previous two instances for second order equations, etc.

Focusing on predictive patterns among the data entries, it is clear that this method requires an ordered support: the data entries need to have some kind of relationship to each other. The most common ordered support is time: data entry $V=\vec{v_1}$ occurred before $V=\vec{v_2}$ ; $V=\vec{v_2}$ in turn happened before $V=\vec{v_3}$ , etc..

Other than an ordered support, mask analysis has no requirements. All variables can be nominal, ordinal or discretized continuous.

Because of the ordered relationship between data records, we don't represent the data in a counts table or probability distribution as this would ignore the ordering. For mask analysis we start out with the whole data-table in its sequential ordering. Then instead of looking to each data instance (entity) individually we put a mask over several connected support (time) instances of data. So for some variables we also look at ``previous'' instances in the data. By doing this we actually create new dimensions, called sampling variables which represent the state of variables at a previous support (time) instance. One of the main questions in mask analysis is how far ``back'', called mask depth, and which variables need to be included as new dimensions. The final purpose is predictability of the current data instances, called generated variables, conditionally on the created new dimensions, called generating variables.

For an example imagine an original data-table with three variables $\{x_1,x_2,x_3\}$ . A mask can be represented by a new set of dimensions, e.g.

$\begin{displaymath}M:=\{x_1^{(t-2)},x_3^{(t-2)},x_1^{(t-1)},x_2^{(t-1)},x_3^{(t-1)}, x_1^{(t)},x_2^{(t)},x_3^{(t)}\} \end{displaymath}$

where (t) refers to data in the t-th (=current) support instance, (t-1) refers to the previous one, etc. In this new model space of variables we induce a counts table and a probability distribution (section 2.1).

From this probability distribution a conditional probability distribution of the generated given the generating variables and conditional entropy is derived.

The mask M together with the conditional probability distribution is our model of behavioral structure in the data, called ``behavior system'', the conditional entropy measures the quality of the model, or how uncertain we are about our prediction. As in reconstructability analysis (section 3.3.2) we need to consider the tradeoff between quality (accuracy) of the model and complexity, reflected by the number of new variables.

A ``behavior system'' can be also seen as describing support-invariant behavior in the data. Independent from the support we have a conditional probability function for predicting the next variable-states. For more details and especially a more formal definition of mask analysis refer to [26, pp. 83-174].

Compared to the basic techniques, mask-analysis mainly uses new dimensions to add the knowledge of previous data to our current record. Using different masks we add different variables to compare how well they are able to predict. The quality of of each behavior system is then obtained by projection and standard entropy-measures.

Next: Unsupervised, Nominal methods Up: Unsupervised, Ordered Methods Previous: Discussion:

Thomas Prang
1998-06-07