next up previous contents
Next: Bibliography Up: Dangers in Data-mining Previous: Training Data

Summary of Dangers

Let's summarize some of the dangers in data-mining (adapted from [11]):
1.
Associations in databases may be due in whole or part to unrecorded common causes and therefore may not indicate any direct causality.
2.
Variable values may be the result of feedback mechanisms which are neither shown in the data nor represented by non-recursive models.
3.
There might be an (unknown) preselection criteria for an entity being in the examined database. For example questionnaires are seldom filled out by a random population sample.
4.
It needs to be carefully evaluated what a data-mining result really expresses (Compare examples milk $\rightarrow$ bread-rule and acceptance of students).

Other precautions and examples can be found in [6, pp. 37-38], [11, pp. 20-22].



Thomas Prang
1998-06-07