×

Comparison of the EM, CEM and SEM algorithms in the estimation of finite mixtures of linear mixed models: a simulation study. (English) Zbl 1505.62299

Summary: Finite mixture models are a widely known method for modelling data that arise from a heterogeneous population. Within the family of mixtures of regression models, mixtures of linear mixed models have also been applied in different areas since, besides taking into consideration the heterogeneity in the population, they also allow to take into account the correlation between observations from the same individual. One of the main issues in mixture models concerns the estimation of the parameters. Maximum likelihood estimation is one of the most used methods in the estimation of the parameters for mixture models. However, the maximization of the log-likelihood function in mixture models is complex, producing in many cases infinite solutions whereby the maximum likelihood estimator may not exist, at least globally. For this reason, it is common to resort to iterative methods, in particular to the Expectation-Maximization (EM) algorithm. However, the slow convergence and the selection of initial values are two of biggest issues of the EM algorithm, the reason why some modified versions of this algorithm have been developed over the years. In this article we compare the performance of the EM, Classification EM (CEM) and Stochastic EM (SEM) algorithms in the estimation of the parameters for mixtures of linear mixed models. In order to evaluate their performance, we carry out a simulation study and a real data application. The results show that the CEM algorithm is the least computationally demanding algorithm, although the three algorithms provide similar maximum likelihood estimates for the parameters.

MSC:

62-08 Computational methods for problems pertaining to statistics
62F10 Point estimation
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bai, X.; Chen, K.; Yao, W., Mixture of linear mixed models using multivariate t distribution, J Stat Comput Simul, 86, 4, 771-787 (2016) · Zbl 1510.62272 · doi:10.1080/00949655.2015.1036431
[2] Benaglia, T.; Chauveau, D.; Hunter, D.; Young, D., Mixtools: an R package for analyzing finite mixture models, J Stat Softw, 32, 6, 1-29 (2009) · doi:10.18637/jss.v032.i06
[3] Celeux, G., The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem, Comput Stat Q, 2, 73-82 (1985)
[4] Celeux, G.; Govaert, G., A classification EM algorithm for clustering and two stochastic versions, Comput Stat Data Anal, 14, 3, 315-332 (1992) · Zbl 0937.62605 · doi:10.1016/0167-9473(92)90042-E
[5] Celeux, G.; Govaert, G., Comparison of the mixture and the classi-fication maximum likelihood in cluster analysis, J Stat Comput Simul, 47, 3-4, 127-146 (1993) · doi:10.1080/00949659308811525
[6] Celeux, G.; Chauveau, D.; Diebolt, J., Stochastic versions of the EM algorithm: an experimental study in the mixture case, J Stat Comput Simul, 55, 4, 287-314 (1996) · Zbl 0907.62024 · doi:10.1080/00949659608811772
[7] Celeux, G.; Martin, O.; Lavergne, C., Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments, Stat Model, 5, 3, 243-267 (2005) · Zbl 1111.62103 · doi:10.1191/1471082X05st096oa
[8] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B (Methodol), 39, 1, 1-38 (1977) · Zbl 0364.62022
[9] Dias, JG; Wedel, M., An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods, Stat Comput, 14, 4, 323-332 (2004) · doi:10.1023/B:STCO.0000039481.32211.5a
[10] Faria, S.; Soromenho, G., Fitting mixtures of linear regressions, J Stat Comput Simul, 80, 2, 201-225 (2010) · Zbl 1184.62118 · doi:10.1080/00949650802590261
[11] Faria, S.; Soromenho, G., Comparison of EM and SEM algorithms in Poisson regression models: a simulation study, Commun Stat Simul Comput, 41, 4, 497-509 (2012) · Zbl 1318.62111 · doi:10.1080/03610918.2011.594534
[12] Frühwirth-Schnatter, S., Finite mixture and Markov switching models (2006), Berlin: Springer, Berlin · Zbl 1108.62002
[13] Gaffney S, Smyth P (2003) Curve clustering with random effects regression mixtures. In: AISTATS
[14] Ganesalingam, S., Classification and mixture approaches to clustering via maximum likelihood, J R Stat Soc Ser C (Appl Stat), 38, 3, 455-466 (1989) · Zbl 0707.62121
[15] Goldstein, H., The design and analysis of longitudinal studies (1979), London: Academic Press, London · Zbl 0492.62092
[16] Grun B (2008) Fitting finite mixtures of linear mixed models with the EM algorithm. In Brito P (ed) Compstat 2008—international conference on Computational Statistics. Springer, Berlin, pp 165-173
[17] Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J., The elements of statistical learning: data mining, inference and prediction, Math Intell, 27, 2, 83-85 (2005)
[18] Liu, C.; Rubin, DB, The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence, Biometrika, 81, 4, 633-648 (1994) · Zbl 0812.62028 · doi:10.1093/biomet/81.4.633
[19] McLachlan, G.; Peel, D., Finite mixture models (2000), Hoboken: Wiley, Hoboken · Zbl 0963.62061 · doi:10.1002/0471721182
[20] Meng, X-L; Rubin, DB, Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 2, 267-278 (1993) · Zbl 0778.62022 · doi:10.1093/biomet/80.2.267
[21] Meng, X-L; Van Dyk, D., The EM algorithm—an old folk-song sung to a fast new tune, J R Stat Soc Ser B (Stat Methodol), 59, 3, 511-567 (1997) · Zbl 1090.62518 · doi:10.1111/1467-9868.00082
[22] Quandt, RE; Ramsey, JB, Estimating mixtures of normal distributions and switching regressions, J Am Stat Assoc, 73, 364, 730-738 (1978) · Zbl 0401.62024 · doi:10.1080/01621459.1978.10480085
[23] R Development Core Team(2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
[24] Scharl, T.; Grün, B.; Leisch, F., Mixtures of regression models for time course gene expression data: evaluation of initialization and random effects, Bioinformatics, 26, 3, 370-377 (2009) · doi:10.1093/bioinformatics/btp686
[25] Verbeke, G.; Molenberghs, G., Linear mixed models for longitudinal data (2009), Berlin: Springer, Berlin · Zbl 1162.62070
[26] Verbeke, G.; Molenberghs, G., Linear mixed models in practice: a SAS-oriented approach (2012), Berlin: Springer, Berlin · Zbl 0882.62064
[27] Yau, KK; Lee, AH; Ng, AS, Finite mixture regression model with random effects: application to neonatal hospital length of stay, Comput Stat Data Anal, 41, 3-4, 359-366 (2003) · Zbl 1256.62065 · doi:10.1016/S0167-9473(02)00180-9
[28] Young, DS; Hunter, DR, Random effects regression mixtures for analyzing infant habituation, J Appl Stat, 42, 7, 1421-1441 (2015) · Zbl 1514.62962 · doi:10.1080/02664763.2014.1000272
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.