In a health database of patients we could create a projection of just the variables illness and food to find a linkage between them. But the distribution of this projection might be random. In general there may be no relationship between illness and the kind of food the patient ate prior his illness. After focusing on one specific illness by subsetting the data the result can look entirely different. Perhaps all patients with stomach pain ate `Hamburger' before being hospitalized.
The opposite of subsetting is supersetting. It allows going back and seeing the relation between the unconditioned values. There might be some general connection, but the chosen values reveal only random behavior. In this case the structure lies in some other values, and we want to go back to see the whole picture.
Subsetting corresponds to ``Conditionalizing'' in statistics. We restrict the values of some variables and look at the ``conditional probabilities''. In subsetting variables to different sets of values we get several conditional probability distributions which can then be compared. Differences in distributions can be testes by several statistical tests (-test, H-test, U-test) [34].
In OLAP terminology this technique is called ``slicing and dicing'' and corresponds to the ``WHERE'' - clause in SQL, in GSPS terminology this is known as ``simplification''.