Finding structure

Next: Projection and Extension Up: Structure Previous: Cross-Entropy

Finding structure

So far finding structures appears to be easy. One just needs to compute the amount of uncertainty in the value-distribution and use it as a measure of the amount of structure. In wide databases this situation is more difficult because finding structures means first finding sets of variables and corresponding values which are highly related and thus have low uncertainty in their distributions. In a huge personal database, for example, there could be a relationship between high income, no kids, and red cars, which wouldn't have been obvious from the entire data set.

For finding these kinds of structures in huge, wide databases there are basically four techniques which can be applied in certain combinations. In the following sections I will discuss these techniques. Note that the first three techniques require only nominal data, though they are also applied in the continuous case, they are unable to find quantitative and ordered relationships, e.g., if X increases in value, then Y increases in value, or quantisized $X= 2.3 \cdot Z$ . Also note that these techniques are presented more as a theoretical framework than for direct practical implementation. The practical use of these techniques will be discussed in the methods-overview in chapter 3.

Next: Projection and Extension Up: Structure Previous: Cross-Entropy

Thomas Prang
1998-06-07