zbMATH — the first resource for mathematics

Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. (English) Zbl 1283.68283
Summary: We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: “what are the implicit statistical assumptions of feature selection criteria based on mutual information?”. To answer this, we adopt a different strategy than is usual in the feature selection literature - instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature ’relevancy’ and ’redundancy’, our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.

68T05 Learning and adaptive systems in artificial intelligence
62F07 Statistical ranking and selection procedures
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: Link