"DISCOVERING RULES THAT GOVERN MONOTONE PHENOMENA"



by Vetle I. Torvik and Evangelos Triantaphyllou

Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, (E. Triantaphyllou and G. Felici, Editors), Springer, Heidelberg, Germany, Chapter 4, pp. 149-192, (2005).


Abstract:
Unlocking the mystery of natural phenomena is a universal objective in scientific research. The rules governing a phenomenon can most often be learned by observing it under a sufficiently large number of conditions that are sufficiently high in resolution. The general knowledge discovery process is not always easy or efficient, and even if knowledge is produced it may be hard to understand, interpret, validate, remember, and use. Monotonicity is a pervasive property in nature: it applies when each predictor variable has a non-negative effect on the phenomenon under study. Due to the monotonicity property, being able to observe the phenomenon under specifically selected conditions may increase the accuracy and completeness of the knowledge at a faster rate than a passive observer who may not receive the pieces relevant to the puzzle soon enough. This scenario can be thought of as learning by successively submitting queries to an oracle which responds with a Boolean value (phenomenon is present or absent). In practice, the oracle may take the shape of a human expert, or it may be the outcome of performing tasks such as running experiments or searching large databases. Our main goal is to pinpoint the queries that minimize the total number of queries used to completely reconstruct all of the underlying rules defined on a given finite set of observable conditions V = {0,1}n. We summarize the optimal query selections in the simple form of selection criteria, which are near optimal and only take polynomial time (in the number of conditions) to compute. Extensive unbiased empirical results show that the proposed selection criterion approach is far superior to any of the existing methods. In fact, the average number of queries is reduced exponentially in the number of variables n and more than exponentially in the oracle’s error rate.

Keywords and Phrases:
Monotone Boolean Functions, Active Learning, Data Mining.



Download this paper as a PDF file. (size = 630 KB)




Visit Dr. Triantaphyllou's Homepage.