Next: Bibliography
Up: Dangers in Data-mining
Previous: Training Data
Let's summarize some of the dangers in data-mining (adapted from [11]):
- 1.
- Associations in databases may be due in whole or part to unrecorded
common causes and therefore may not indicate any direct causality.
- 2.
- Variable values may be the result of feedback mechanisms
which are neither shown in the data nor represented by non-recursive
models.
- 3.
- There might be an (unknown) preselection criteria for an entity being
in the examined database. For example questionnaires are seldom filled
out by a random population sample.
- 4.
- It needs to be carefully evaluated what a data-mining result really expresses
(Compare examples milk
bread-rule and acceptance of students).
Other precautions and examples can be found in [6, pp. 37-38],
[11, pp. 20-22].
Thomas Prang
1998-06-07