Robust cluster analysis and variable selection. (English) Zbl 1341.62037

Monographs on Statistics and Applied Probability 137. Boca Raton, FL: CRC Press (ISBN 978-1-4398-5796-0/hbk; 978-1-4398-5797-7/ebook). xx, 371 p. (2015).
In this monograph, a deep and comprehensive overview of the theory and applications of probabilistic clustering and variable selection is presented. Throughout the monograph, data is considered to be drawn from a mixture model consisting of unimodal subpopulations which correspond to different classes. In this statistical framework, a selection of robust methods, that the author found most useful in the analysis of real and simulated data, is provided. The monograph consists of six chapters, a list of figures and a list of tables, the glossary of notation and seven appendices, where material requisite for the understanding of the main part (six chapters) of the monograph is given. The appendices are devoted to geometry, topology, analysis, measure and probability theory, statistics and optimization theory, respectively. In the sequel, the subject of each one of the six chapters is briefly presented.
Chapter 1 (“Mixture and classification models and their likelihood estimators”) focuses on (parametric and nonparametric) finite mixture and classification models and on asymptotic properties of their likelihood estimators. Chapter 2 (“Robustification by trimming”) initially provides a discussion about outliers and about robustness measures for estimators. Afterwards, trimming methods for maximum likelihood estimators of mixture and classification models are introduced and their robustness is analyzed. Chapter 3 (“Algorithms”) deals with the problem of computing local likelihood maxima and steady solutions to cluster criteria. In this frame, some algorithms for computing the solutions, promised in the first two chapters, are given. For instance, the EM algorithm, the EMT algorithm and the \(k\)-parameters algorithm are presented. Chapter 4 (“Favorite solutions and cluster validation”) deals with estimating the number of groups and outliers and valid solutions. Chapter 5 (“Variable selection in clustering”) describes methods for enhancing mixture and cluster analysis by variable selection. Chapter 6 (“Applications”) illustrates the methods of this book to four data sets from different fields. Three of the four data sets (iris data from botany, Swiss bills from criminology and the leukemia gene expression data) are well known, while the fourth one (stone flakes from prehistoric archaeology) is not known, but, similar to the others, its true solution is essentially known. The purpose of this chapter is not to analyze the previous mentioned data sets but to assess the effectiveness of the methods.
Summarizing, this monograph endeavors to give an overview of the theory and applications of probabilistic clustering and variable selection, with a special emphasis on problems caused by outliers and irrelevant variables. The author wrote an interesting and high valuable monograph, which intended to serve the need of researchers and data analysts. For data analysts, it includes a variety of clustering methods to choose from and gives many advice for applying them without having to understand their probabilistic fundamentals. For researchers, it gives an excellent overview of the mathematical foundations and the statistical principles of model-based clustering.


62-02 Research exposition (monographs, survey articles) pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F12 Asymptotic properties of parametric estimators
62P10 Applications of statistics to biology and medical sciences; meta analysis
62P99 Applications of statistics
Full Text: Link