An effective strategy for initializing the EM algorithm in finite mixture models. (English) Zbl 1414.62256

Summary: Finite mixture models represent one of the most popular tools for modeling heterogeneous data. The traditional approach for parameter estimation is based on maximizing the likelihood function. Direct optimization is often troublesome due to the complex likelihood structure. The expectation-maximization algorithm proves to be an effective remedy that alleviates this issue. The solution obtained by this procedure is entirely driven by the choice of starting parameter values. This highlights the importance of an effective initialization strategy. Despite efforts undertaken in this area, there is no uniform winner found and practitioners tend to ignore the issue, often finding misleading or erroneous results. In this paper, we propose a simple yet effective tool for initializing the expectation-maximization algorithm in the mixture modeling setting. The idea is based on model averaging and proves to be efficient in detecting correct solutions even in those cases when competitors perform poorly. The utility of the proposed methodology is shown through comprehensive simulation study and applied to a well-known classification dataset with good results.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI


[1] Azzalini, A.; Valle, DA, The multivariate skew-normal distribution, Biometrika, 83, 715-726, (1996) · Zbl 0885.62062
[2] Baudry, J-P; Raftery, A.; Celeux, G.; Lo, K.; Gottardo, R., Combining mixture components for clustering, J Comput Graph Stat, 19, 332-353, (2010)
[3] Biernacki, C., Initializing EM using the properties of its trajectories in Gaussian mixtures, Stat Comput, 14, 267-279, (2004)
[4] Biernacki, C.; Celeux, G.; Govaert, G., Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Comput Stat Data Anal, 413, 561-575, (2003) · Zbl 1429.62235
[5] Bouveyron, C.; Brunet, C., Model-based clustering of high-dimensional data: a review, Comput Stat Data Anal, 71, 52-78, (2013) · Zbl 1471.62032
[6] Campbell, NA; Mahon, RJ, A multivariate study of variation in two species of rock crab of genus Leptograsus, Austr J Zool, 22, 417-425, (1974)
[7] Celebi ME, Kingravi HA, Vela PA (2012) A comparative study of efficient initialization methods for the \(k\)-means clustering algorithm. Comput Res Reposit. arXiv:1209.1960
[8] Celeux, G.; Diebolt, J., The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem, Comput Stat, 2, 73-82, (1985)
[9] Chen WC, Maitra R (2015) EMCluster: EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution, R Package. http://cran.r-project.org/package=EMCluster
[10] Dias, J.; Wedel, M., An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods, Stat Comput, 14, 323-332, (2004)
[11] Forgy, E., Cluster analysis of multivariate data: efficiency vs. interpretability of classifications, Biometrics, 21, 768-780, (1965)
[12] Fraley, C., Algorithms for model-based gaussian hierarchical clustering, SIAM J Sci Comput, 20, 270-281, (1998) · Zbl 0911.62052
[13] Fraley, C.; Raftery, AE, How many clusters? Which cluster method? Answers via model-based cluster analysis, Comput J, 41, 578-588, (1998) · Zbl 0920.68038
[14] Fraley, C.; Raftery, AE, Model-based clustering, discriminant analysis, and density estimation, J Am Stat Assoc, 97, 611-631, (2002) · Zbl 1073.62545
[15] Fraley C, Raftery AE (2006) MCLUST version 3 for R: normal mixture modeling and model-based clustering. Tech. Rep. 504. University of Washington, Department of Statistics, Seattle
[16] Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Class 4(1):3-34 · Zbl 1306.62141
[17] Hershey JR, Olsen PA (2007) Approximating the kullback leibler divergence between gaussian mixture models. In: IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, pp IV-317-IV-320
[18] Hoeting, JA; Madigan, DM; Raftery, AE; Volinsky, CT, Bayesian model averaging: a tutorial, Stat Sci, 14, 382-417, (1999) · Zbl 1059.62525
[19] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218, (1985) · Zbl 0587.62128
[20] Kaufman L, Rousseuw PJ (1990) Finding Groups in Data. Wiley, New York · Zbl 1345.62009
[21] Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2015) Rmixmod: the R package of the model-based unsupervised, supervised, and semi-supervised classification mixmod library. J Stat Softw 67(6). doi:10.18637/jss.v067.i06
[22] MacQueen, J., Some methods for classification and analysis of multivariate observations, Proc Fifth Berkeley Symp, 1, 281-297, (1967) · Zbl 0214.46201
[23] Madigan, D.; Raftery, AE, Model selection and accounting for model uncertainty in graphical models using Occams window, J Am Stat Assoc, 89, 1535-1546, (1994) · Zbl 0814.62030
[24] Maitra, R., Initializing partition-optimization algorithms, IEEE/ACM Trans Comput Biol Bioinf, 6, 144-157, (2009)
[25] Maitra, R.; Melnykov, V., Simulating data to study performance of finite mixture modeling and clustering algorithms, J Comput Graph Stat, 19, 354-376, (2010)
[26] McLachlan G, Peel D (2000) Finite Mixture Models. Wiley, New York · Zbl 0963.62061
[27] Melnykov, V., Challenges in model-based clustering, WIREs Comput Stat, 5, 135-148, (2013)
[28] Melnykov, V., Merging mixture components for clustering through pairwise overlap, J Comput Graph Stat, 25, 66-90, (2016)
[29] Melnykov, V.; Chen, W-C; Maitra, R., MixSim: R package for simulating datasets with pre-specified clustering complexity, J Stat Softw, 51, 1-25, (2012)
[30] Melnykov, V.; Melnykov, I., Initializing the EM algorithm in Gaussian mixture models with an unknown number of components, Comput Stat Data Anal, 56, 1381-1395, (2012) · Zbl 1246.65025
[31] Melnykov V, Melnykov I, Michael S (2015a) Semi-supervised model-based clustering with positive and negative constraints. In: Advances in data analysis and classification, pp 1-23
[32] Melnykov V, Michael S, Melnykov I (2015b) Recent developments in model-based clustering with applications. In: Celebi ME (ed) Partitional clustering algorithms, vol 1. Springer, Berlin, pp 1-39 · Zbl 1347.62116
[33] Prates M, Lachos V, Cabral C (2013) Mixsmsn: fitting finite mixture of scale mixture of skew-normal distributions. J Stat Softw 54(12):1-20
[34] Sneath, P., The application of computers to taxonomy, J Gener Microbiol, 17, 201-226, (1957)
[35] Sorensen, T., A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons, Biologiske Skrifter, 5, 1-34, (1948)
[36] Stahl, D.; Sallis, H., Model-based cluster analysis, Wiley Interdiscipl Rev Comput Stat, 4, 341-358, (2012)
[37] Ward, JH, Hierarchical grouping to optimize an objective function, J Am Stat Assoc, 58, 236-244, (1963)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.