A data driven equivariant approach to constrained Gaussian mixture modeling. (English) Zbl 1414.62269

Summary: Maximum likelihood estimation of Gaussian mixture models with different class-specific covariance matrices is known to be problematic. This is due to the unboundedness of the likelihood, together with the presence of spurious maximizers. Existing methods to bypass this obstacle are based on the fact that unboundedness is avoided if the eigenvalues of the covariance matrices are bounded away from zero. This can be done imposing some constraints on the covariance matrices, i.e. by incorporating a priori information on the covariance structure of the mixture components. The present work introduces a constrained approach, where the class conditional covariance matrices are shrunk towards a pre-specified target matrix \(\boldsymbol{\varPsi}\). Data-driven choices of the matrix \(\boldsymbol{\varPsi}\), when a priori information is not available, and the optimal amount of shrinkage are investigated. Then, constraints based on a data-driven \(\boldsymbol{\varPsi}\) are shown to be equivariant with respect to linear affine transformations, provided that the method used to select the target matrix be also equivariant. The effectiveness of the proposal is evaluated on the basis of a simulation study and an empirical example.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-07 Data analysis (statistics) (MSC2010)


Full Text: DOI arXiv


[1] Anderson, TW; Gupta, SD, Some inequalities on characteristic roots of matrices, Biometrika, 50, 522-524, (1963) · Zbl 0133.41602
[2] Arlot, S.; Celisse, A., A survey of cross-validation procedures for model selection, Stat Surv, 4, 40-79, (2010) · Zbl 1190.62080
[3] Biernacki, C.; Chrétien, S., Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with the EM, Stat Probab Lett, 61, 373-382, (2003) · Zbl 1038.62023
[4] Browne RP, Subedi S, McNicholas P (2013) Constrained optimization for a subset of the Gaussian parsimonious clustering models. arXiv:1306.5824
[5] Chen, J.; Tan, X., Inference for multivariate normal mixtures, J Multivar Anal, 100, 1367-1383, (2009) · Zbl 1162.62052
[6] Chen, J.; Tan, X.; Zhang, R., Inference for normal mixtures in mean and variance, Stat Sin, 18, 443, (2008) · Zbl 1135.62018
[7] Ciuperca, G.; Ridolfi, A.; Idier, J., Penalized maximum likelihood estimator for normal mixtures, Scand J Stat, 30, 45-59, (2003) · Zbl 1034.62018
[8] Day, NE, Estimating the components of a mixture of two normal distributions, Biometrika, 56, 463-474, (1969) · Zbl 0183.48106
[9] Dawid, AP, Some matrix-variate distribution theory: notational considerations and a Bayesian application, Biometrika, 68, 265-274, (1981) · Zbl 0464.62039
[10] Dickey, JM, Matricvariate generalizations of the multivariate t distribution and the inverted multivariate t distribution, Ann Math Stat, 38, 511-518, (1967) · Zbl 0158.18403
[11] Mari, R.; Oberski, DL; Vermunt, JK, Bias-adjusted three-step latent Markov modeling with covariates, Struct Equ Model Multidiscip J, (2016)
[12] Doherty, KAJ; Adams, RG, Unsupervised learning with normalised data and non-Euclidean norms, Appl Soft Comput, 7, 20321, (2007)
[13] Fraley, C.; Raftery, AE, Bayesian regularization for normal mixture estimation and model-based clustering, J Classif, 24, 155-181, (2007) · Zbl 1159.62302
[14] Fritz, H.; Garcia-Escudero, LA; Mayo-Iscar, A., A fast algorithm for robust constrained clustering, Comput Stat Data Anal, 61, 124-136, (2013) · Zbl 1349.62264
[15] Gallegos, MT; Ritter, G., Trimming algorithms for clustering contaminated grouped data and their robustness, Adv Data Anal Classif, 3, 135-167, (2009) · Zbl 1284.62372
[16] Gallegos MT, Ritter G (2009b) Trimmed ML estimation of contaminated mixtures. Sankhya Indian J Stat Ser A (2008-) 71(2):164-220 · Zbl 1193.62021
[17] Garcia-Escudero, LA; Gordaliza, A.; Matran, C.; Mayo-Iscar, A., A general trimming approach to robust cluster analysis, Ann Stat, 36, 1324-1345, (2008) · Zbl 1360.62328
[18] Garcia-Escudero, LA; Gordaliza, A.; Matran, C.; Mayo-Iscar, A., Avoiding spurious local maximizers in mixture modeling, Stat Comput, 25, 619-633, (2014) · Zbl 1331.62100
[19] Greselin, F.; Ingrassia, S., Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers, Stat Comput, 25, 215-226, (2013) · Zbl 1331.62307
[20] Hathaway, RJ, A constrained formulation of maximum-likelihood estimation for normal mixture distributions, Ann Stat, 13, 795-800, (1985) · Zbl 0576.62039
[21] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218, (1985) · Zbl 0587.62128
[22] Ingrassia, S., A likelihood-based constrained algorithm for multivariate normal mixture models, Stat Methods Appl, 13, 151-166, (2004) · Zbl 1205.62066
[23] Ingrassia, S.; Rocci, R., A constrained monotone EM algorithm for finite mixture of multivariate Gaussians, Comput Stat Data Anal, 51, 5339-5351, (2007) · Zbl 1445.62116
[24] Ingrassia, S.; Rocci, R., Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints, Comput Stat Data Anal, 55, 1715-1725, (2011) · Zbl 1328.65030
[25] James W, Stein C (1961) Estimation with quadratic loss. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability Vol. 1, No. 1961, pp 361-379 · Zbl 1281.62026
[26] Kearns, M., A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split, Neural Comput, 9, 1143-1161, (1997)
[27] Kiefer, NM, Discrete parameter variation: efficient estimation of a switching regression model, Econometrica, 46, 427-434, (1978) · Zbl 0408.62058
[28] Kiefer, J.; Wolfowitz, J., Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters, Ann Math Stat, 27, 886906, (1956) · Zbl 0073.14701
[29] Kim, D.; Seo, B., Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers, J Multivar Anal, 125, 100-120, (2014) · Zbl 1280.62028
[30] Kleinber J (2002) An impossibility theorem for clustering. In: Advances in neural information processing systems, (NIPS). MIT Press, Cambridge, pp 446-453
[31] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York · Zbl 0963.62061
[32] McLachlan GJ, Peel D (1998) Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin A, Dori D, Pudil P, Freeman H (eds) Lecture Notes in Computer Science, vol 1451. Springer, Berlin, pp 658-666
[33] Milligan, GW; Cooper, MC, A study of standardization of variables in cluster analysis, J Classif, 5, 181-204, (1988)
[34] Peel, D.; McLachlan, GJ, Robust mixture modelling using the t distribution, Stat Comput, 10, 339-348, (2000)
[35] Policello II GE (1981) Conditional maximum likelihood estimation in gaussian mixtures. In: Taillie C, Patil GP, Baldessari BA (eds) Statistical distributions in scientific work. Volume 5-inferential problems and properties proceedings of the NATO advanced study institute held at the Università degli Studi di Trieste, Trieste, Italy, July 10-August 1 1980. NATO advanced study institutes series, vol 79. Springer, Netherlands, pp 111-125
[36] Ridolfi A, Idier J (1999) Penalized maximum likelihood estimation for univariate normal mixture distributions. In: Actes du 17’ colloque GRETSI, Vannes, pp 259-262
[37] Ridolfi A, Idier J (2000) Penalized maximum likelihood estimation for univariate normal mixture distributions. Bayesian inference and maximum entropy methods, MaxEnt workshops. Gif-sur-Yvette, July 2000
[38] Ritter G (2014) Robust cluster analysis and variable selection. CRC Press, Boca Raton · Zbl 1341.62037
[39] Roth M (2013) On the multivariate \(t\) distribution. Technical report, Linköping university, Division of automatic control
[40] Seo, B.; Kim, D., Root selection in normal mixture models, Comput Stat Data Anal, 56, 2454-2470, (2012) · Zbl 1252.62013
[41] Smyth P (1996) Clustering using Monte-Carlo cross validation. In Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, Menlo Park, p 126133
[42] Smyth, P., Model selection for probabilistic clustering using cross-validated likelihood, Stat Comput, 10, 63-72, (2000)
[43] Snoussi H, Mohammad-Djafari A (2001) Penalized maximum likelihood for multivariate Gaussian mixture. In: Fry RL (ed) MaxEnt workshops: Bayesian inference and maximum entropy methods, Aug 2001, pp 36-46
[44] Tan X, Chen J, Zhang R (2007) Consistency of the constrained maximum likelihood estimator in finite normal mixture models. In: Proceedings of the American Statistical Association, American Statistical Association, Alexandria, 2007 [CD-ROM], pp 2113-2119
[45] Tanaka, K.; Takemura, A., Strong consistency of the maximum likelihood estimator for finite mixtures of locationscale distributions when the scale parameters are exponentially small, Bernoulli, 12, 1003-1017, (2006) · Zbl 1117.62025
[46] Laan, MJ; Dudoit, S.; Keles, S., Asymptotic optimality of likelihood-based cross-validation, Stat Appl Genet Mol Biol, 3, 1-23, (2004) · Zbl 1038.62040
[47] Vermunt, JK, Latent class modeling with covariates: two improved three-step approaches, Polit Anal, 18, 450-469, (2010)
[48] Xu, J.; Tan, X.; Zhang, R., A note on Phillips (1991): a constrained maximum likelihood approach to estimating switching regressions, J Econom, 154, 35-41, (2010) · Zbl 1431.62294
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.