Discussion:

Next: Relationships: Up: Rule inference Previous: Association Rules:

Discussion:

Most attention in the rule-induction literature is spent on association rules. Though only fairly basic information measures are used, the mining of association rules is easily implemented and offers fast computational algorithms due to the minimum support criteria. The already designed usability of hierarchies (taxonomies) in the framework of generalized association rules is another advantage.

The disadvantage is that measuring rules by their support and confidence is not always sufficient. The important distinctiveness of the transition probability compared to the a priori probability is not taken into account (though some experiments via the chi-square test have been done [47,36]). Furthermore it may be difficult to find a lower limit for the support. Choosing it too small slows down the algorithm drastically, choosing it too large we might overlook many interesting patterns. While in general market research we may only be interested in significant overall rules, in other areas like fraud detection we are also interested in small, seldom occurring pattern which are very distinct from the overall behavior.

The J-measure states a more interesting approach. It combines simplicity (which is similar to support s) with distinctiveness and confidence in an information theoretic based way. The big problem of this measure is the high computational complexity of its algorithm. Though some of the `specializations' are ignored by the computation of upper bounds, the evaluation and bounding of all first-order rules alone can be exhausting for wide database-relations. Also the use of hierarchies is not yet considered. Recalling the complexity of the algorithm we probably want to use, either way, as few values as possible for each dimension.

In general, when talking about measures we need to consider that one-dimensional evaluations are always tradeoffs between many criteria. We can't expect one measure to be useful for all our problems. Finding big overall patterns and finding structures for classification purposes are already two quite distinctive aims. I only discussed two dominant approaches for rule-finding, but it should be clear that the ``Purpose of Investigation'' (section 1.2) should point out the direction of our measures. It might be that several different complementary measures should be used.

The need of hierarchies for finding general as well as specialized rules is already pointed out several times. In general the use of them needs to be considered in combination with the measures and the purpose of investigation.

Guided search of rules is another fashion of rule finding which not only cuts down computational time but can be also very useful to the investigator. In this approach the user interactively selects a (variable,value)-pair(s) for the precondition or the conclusion of the rule. The algorithm then only evaluates rules which contain the given pair.

Similar to the ``Meta-query'' used in the product KnowledgeMiner [46], we could also define ``Meta-rules'' in specifying some or all of the variables, while the algorithm needs to search for appropriate values. For example we might specify (X₁= income) and (Y= car) when we want to know if there are any rules which connect specific income groups to some car fabricate.

One danger in using rule-induction is that the results may be misleading. As will be described in section 4, rules do not necessarily show any causality. Also depending on the measure we will find good rules which do not even have any correlation between the precondition and the conclusion. I already mentioned this in the context of distinctiveness. For example if 90% of all customers buy milk and independently 90% buy bread then the rule `bread $\rightarrow$ milk' would have a natural support of .81 and confidence of .90. The same is true for the opposite rule `milk $\rightarrow$ bread'.

Next: Relationships: Up: Rule inference Previous: Association Rules:

Thomas Prang
1998-06-07