Log-linear models

Next: Rule inference Up: Unsupervised, Nominal methods Previous: DEEP

Log-linear models

Hierarchical Log-linear models are also quite similar to the approach of reconstructability analysis (also to mask analysis [50]). They investigate which combined effects of variables are required for a good approximation of the overall relation, or in short what structure among the variables is necessary to describe the data sufficiently.

Log-linear models in general start out with count tables (contingency tables). As in RA, the data is then projected to the hypothesized subrelations, still represented by counts compared to some other measure (mostly probability) in RA. Then the overall relation is rebuilt by maximum likelihood estimates (MLE) via the iterative proportional fitting algorithm (Deming-Stephan algorithm), rather than the unbiased join procedure in RA. Also some other reconstruction algorithms are known, e.g. the Newton-Raphson algorithm [30, pg. 22].

While RA describes its models just by subsets of variables, this method puts the main emphasis on describing the result as a ``log-linear model''. The cell-counts F_ij of two variables 1,2 are expressed in the following way:

$\begin{displaymath}F_{ij} = \eta \cdot \tau_i^{(1)} \cdot \tau_j^{(2)} \cdot \tau_{ij}^{(1,2)} \end{displaymath}$

(23)

where $\eta$ is the geometric mean, $\tau_i^{(1)}$ the effect of the i-th value of the first (1) variable, and so forth. The reason for the name log-linear comes from a simple transformation which is often done of the above formula:

$\begin{displaymath}\log(F_{ij}) = \log(\eta) + \log\left(\tau_i^{(1)}\right) + ... ...left(\tau_j^{(2)}\right) + \log\left(\tau_{ij}^{(1,2)}\right) \end{displaymath}$

(24)

These models are called ``saturated'' as they represent the whole relationship between two variables. The $\eta$ and the $\tau$ 's can be calculated according to the counts. For details on this see [30]. A ``simplified'' model is obtained by ignoring some of the $\tau$ interaction terms and assuming, e.g. $\tau_{ij}^{(1,2)}=1$ . This ``unsaturated'' model is used for representing the projected and reconstructed data. Note that hierarchical models require that models containing high order tau's (e.g. $\tau_{ij}^{(1,2)}$ ) also contain all its lower order tau's (e.g. $\tau_i^{(1)}$ and $\tau_j^{(2)}$ ). High order in this context reflects the number of variables interacting.

Though RA and hierarchical log-linear models are quite similar in what they actually do, their main differences are in their approaches. RA emphasis on all the possible models and the search through model space for a huge number of variables. Log-linear models concentrate more on statistical aspects and the interactions between a small number of variables. Because of the similarities it is at least interesting to follow the development and history of both methods.

Next: Rule inference Up: Unsupervised, Nominal methods Previous: DEEP

Thomas Prang
1998-06-07