Probabilistic clustering via Pareto solutions and significance tests. (English) Zbl 1414.62243

Summary: The present paper proposes a new strategy for probabilistic (often called model-based) clustering. It is well known that local maxima of mixture likelihoods can be used to partition an underlying data set. However, local maxima are rarely unique. Therefore, it remains to select the reasonable solutions, and in particular the desired one. Credible partitions are usually recognized by separation (and cohesion) of their clusters. We use here the \(p\) values provided by the classical tests of Wilks, Hotelling, and Behrens-Fisher to single out those solutions that are well separated by location. It has been shown that reasonable solutions to a clustering problem are related to Pareto points in a plot of scale balance vs. model fit of all local maxima. We briefly review this theory and propose as solutions all well-fitting Pareto points in the set of local maxima separated by location in the above sense. We also design a new iterative, parameter-free cutting plane algorithm for the multivariate Behrens-Fisher problem.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-07 Data analysis (statistics) (MSC2010)
Full Text: DOI


[1] Aitchison, J.; Silvey, SD, Maximum-likelihood estimation procedures and associated tests of significance, J R Stat Soc Ser B, 22, 154-171, (1960) · Zbl 0096.34403
[2] Bailey, TA; Dubes, RC, Cluster validity profiles, Patt Rec, 15, 61-83, (1982)
[3] Behrens WU, Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen. Landwirtschaftliche Jahrbücher. Zeitschrift für wissenschaftliche Landwirtschaft und Archiv des Königlich Preussischen Landes-Oekonomie-Kollegiums, 68:807-837, 1929. Original in Hathi Trust Digital Library
[4] Belloni, A.; Didier, G., On the Behrens-Fisher problem: a globally convergent algorithm and a finite-sample study of the Wald, LR and LM tests, Ann Stat, 36, 2377-2408, (2008) · Zbl 1274.62379
[5] Bock, H-H, On some significance tests in cluster analysis, J Classif, 2, 77-108, (1985) · Zbl 0587.62048
[6] Böhning D (2000) Computer-assisted analysis of mixtures and applications. Chapman & Hall/CRC, Boca Raton · Zbl 0951.62088
[7] Bonnans J-F, Gilbert JC, Lemaréchal C, Sagastizábal CA (2006) Numerical optimization. Theoretical and practical aspects, 2nd edn. Springer, Berlin · Zbl 1108.65060
[8] Campbell, NA; Mahon, RJ, A multivariate study of variation in two species of rock crab of the genus Leptograpsus, Austral J Zool, 22, 417-425, (1974)
[9] Cox DR, Hinkley DV (1974) Theoretical statistics. Chapman & Hall, London · Zbl 0334.62003
[10] Day, NE, Estimating the components of a mixture of normal distributions, Biometrika, 56, 463-474, (1969) · Zbl 0183.48106
[11] Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York · Zbl 0853.68150
[12] Fisher, RA, The comparison of samples with possibly unequal variances, Ann Eugenics, 9, 174-180, (1939) · JFM 65.0596.03
[13] Fisher, RA, The asymptotic approach to Behrens’ integral with further tables for the \(d\) test of significance, Ann Eugenics, 11, 141-172, (1941)
[14] Fraley, C.; Raftery, AE, MCLUST: software for model-based cluster analysis, J Classif, 16, 297-306, (1999) · Zbl 0951.91500
[15] Fritz, H.; García-Escudero, LA; Mayo-Iscar, A., A fast algorithm for robust constrained clustering, Comput Stat Data Anal, 61, 124-136, (2013) · Zbl 1349.62264
[16] Frühwirth-Schnatter S (2006) Finite mixture and markov switching models. Springer, Heidelberg · Zbl 1108.62002
[17] Gallegos, MT; Ritter, G., Trimmed ML-estimation of contaminated mixtures, Sankhya Ser A, 71, 164-220, (2009) · Zbl 1193.62021
[18] Gallegos, MT; Ritter, G., Trimming algorithms for clustering contaminated grouped data and their robustness, Adv Data Anal Classif, 3, 135-167, (2009) · Zbl 1284.62372
[19] Gallegos, MT; Ritter, G., Using combinatorial optimization in model-based trimmed clustering with cardinality constraints, Comput Stat Data Anal, 54, 637-654, (2010) · Zbl 1464.62075
[20] Gallegos, MT; Ritter, G., Strong consistency of \(k\)-parameters clustering, J Multivar Anal, 117, 14-31, (2013) · Zbl 1359.62239
[21] Hathaway, RJ, A constrained formulation of maximum-likelihood estimation for normal mixture distributions, Ann Stat, 13, 795-800, (1985) · Zbl 0576.62039
[22] Kiefer, J.; Wolfowitz, J., Consistency of the maximum-likelihood estimation in the presence of infinitely many incidental parameters, Ann Math Stat, 27, 887-906, (1956) · Zbl 0073.14701
[23] Kiefer, NM, Discrete parameter variation: efficient estimation of a switching regression model, Econometrica, 46, 427-434, (1978) · Zbl 0408.62058
[24] Lee, SX; McLachlan, GJ, On mixtures of skew normal and skew \(t\)-distributions, Adv Data Anal Classif, 7, 241-266, (2013) · Zbl 1273.62115
[25] Lee, SX; McLachlan, GJ, Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results, Stat Comput, 24, 181-202, (2014) · Zbl 1325.62107
[26] Lindsay BG (1995) Mixture models: theory, geometry and applications. NSF-CBMS regional conference series in probability and statistics, vol 5. IMS and ASA, Hayward
[27] Mardia KV, Kent T, Bibby JM (1997) Multivariate analysis, 6th edn. Academic Press, London · Zbl 0432.62029
[28] McLachlan GJ, Peel D (2000a) Finite mixture models. Wiley, New York · Zbl 0963.62061
[29] McLachlan GJ, Peel D (2000) On computational aspects of clustering via mixtures of normal and \(t\)-components. In: Proceedings of the American Statistical Association. American Statistical Association, Alexandria
[30] Muirhead RJ (1982) Aspects of multivariate statistical theory., Wiley series in probability and mathematical statisticsWiley, New York · Zbl 0556.62028
[31] Peters, BC; Walker, HF, An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions, SIAM J Appl Math, 35, 362-378, (1978) · Zbl 0443.65112
[32] Ritter G (2015) Robust cluster analysis and variable selection. Monographs in statistics and applied probability, vol 137. Chapman & Hall/CRC, Boca Raton · Zbl 1341.62037
[33] Rossant, C.; Kadir, S.; Goodman, DFM; Harris, KD, Spike sorting for large, dense electrode arrays, Nature Neurosci, 19, 624-641, (2016)
[34] Rousseeuw, PJ, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J Comput Appl Math, 20, 53-65, (1987) · Zbl 0636.62059
[35] Silvey SD (1970) Statistical inference. Penguin, Baltimore · Zbl 0207.49001
[36] Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading · Zbl 0409.62003
[37] Wilks, SS, Certain generalizations in the analysis of variance, Biometrika, 24, 471-494, (1932) · JFM 58.1172.02
[38] Wilks, SS, The large-sample distribution of the likelihood ratio for testing composite hypotheses, Ann Math Stat, 9, 60-62, (1938) · Zbl 0018.32003
[39] Yakowitz, SJ; Spragins, JD, On the identifiability of finite mixtures, Ann Stat, 39, 209-214, (1968) · Zbl 0155.25703
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.