×

An asynchronous distributed expectation maximization algorithm for massive data: the DEM algorithm. (English) Zbl 07499048

Summary: The family of expectation-maximization (EM) algorithms provides a general approach to fitting flexible models for large and complex data. The expectation (E) step of EM-type algorithms is time-consuming in massive data applications because it requires multiple passes through the full data. We address this problem by proposing an asynchronous and distributed generalization of the EM called the distributed EM (DEM). Using DEM, existing EM-type algorithms are easily extended to massive data settings by exploiting the divide-and-conquer technique and widely available computing power, such as grid computing. The DEM algorithm reserves two groups of computing processes called workers and managers for performing the E step and the maximization step (M step), respectively. The samples are randomly partitioned into a large number of disjoint subsets and are stored on the worker processes. The E step of DEM algorithm is performed in parallel on all the workers, and every worker communicates its results to the managers at the end of local E step. The managers perform the M step after they have received results from a \(\gamma\)-fraction of the workers, where \(\gamma\) is a fixed constant in \((0,1]\). The sequence of parameter estimates generated by the DEM algorithm retains the attractive properties of EM: convergence of the sequence of parameter estimates to a local mode and linear global rate of convergence. Across diverse simulations focused on linear mixed-effects models, the DEM algorithm is significantly faster than competing EM-type algorithms while having a similar accuracy. The DEM algorithm maintains its superior empirical performance on a movie ratings database consisting of 10 million ratings. Supplementary material for this article is available online.

MSC:

62-XX Statistics

Software:

OpenMPI; Rmpi; lme4
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Altinigneli, M. C.; Plant, C.; Böhm, C., Massively Parallel EM Using Graphics Processing Units, Proceedings of the 19th ACM SIGKDD Conference, 838-846 (2013)
[2] Bates, D.; Maechler, M.; Bolker, B.; Walker, S., lme4: Linear Mixed-Effects Models Using Eigen and S4, R Package Version 1.1-9 (2013)
[3] Cappé, O., Online EM Algorithm for Hidden Markov Models, Journal of Computational and Graphical Statistics, 20, 728-749 (2011)
[4] Cappé, O.; Moulines, E., Online EM Algorithm for Latent Data Models, Journal of the Royal Statistical Society, 71, 593-613 (2009) · Zbl 1250.62015
[5] Chen, W.-C.; Ostrouchov, G.; Pugmire, D.; Wehner, M., A Parallel EM Algorithm for Model-Based Clustering Applied to the Exploration of Large Spatio-Temporal Data, Technometrics, 55, 513-523 (2013)
[6] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum Likelihood From Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, 39, 1-38 (1977) · Zbl 0364.62022
[7] Fajardo, V. A.; Liang, J., On the EM-Tau Algorithm: A New EM-Style Algorithm With Partial E-Steps, arXiv preprint arXiv:1711.07814 (2017)
[8] Gabriel, E.; Fagg, G. E.; Bosilca, G.; Angskun, T.; Dongarra, J. J.; Squyres, J. M.; Sahay, V.; Kambadur, P.; Barrett, B.; Lumsdaine, A.; Castain, R. H., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users’ Group Meeting, 97-104 (2004)
[9] Gu, D., Distributed EM Algorithm for Gaussian Mixtures in Sensor Networks, IEEE Transactions on Neural Networks, 19, 1154-1166 (2008)
[10] He, Y.; Liu, C., The Dynamic ECME Algorithm, Journal of the Royal Statistical Society, 74, 313-336 (2012) · Zbl 1411.62059
[11] Jamshidian, M.; Jennrich, R. I., Acceleration of the EM Algorithm by Using Quasi-Newton Methods, Journal of the Royal Statistical Society, 59, 569-587 (1997) · Zbl 0889.62042
[12] Kim, Y.; Choi, Y.-K.; Emery, S., Logistic Regression With Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages, The American Statistician, 67, 171-182 (2013) · Zbl 07649202
[13] Lange, K., A Gradient Algorithm Locally Equivalent to the EM Algorithm, Journal of the Royal Statistical Society, 57, 425-437 (1995) · Zbl 0813.62021
[14] Le Corff, S.; Fort, G.; Moulines, E., Online EM Algorithm to Solve the SLAM Problem, Statistical Signal Processing Workshop, 225-228 (2011)
[15] Lee, S. X.; Leemaqz, K. L.; McLachlan, G. J., A Simple Parallel EM Algorithm for Statistical Learning via Mixture Models, International Conference on Digital Image Computing: Techniques and Applications (DICTA), 1-8 (2016)
[16] Liu, C.; Rubin, D. B., The ECME Algorithm: A Simple Extension of EM and ECM With Faster Monotone Convergence, Biometrika, 81, 633-648 (1994) · Zbl 0812.62028
[17] Liu, C.; Rubin, D. B.; Wu, Y. N., Parameter Expansion to Accelerate EM: The PX-EM Algorithm, Biometrika, 85, 755-770 (1998) · Zbl 0921.62071
[18] Liu, D.; Liu, R. Y.; Xie, M., Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness, Journal of the American Statistical Association, 110, 326-340 (2015) · Zbl 1373.62135
[19] Meng, X.-L., On the Rate of Convergence of the ECM Algorithm, The Annals of Statistics, 22, 326-339 (1994) · Zbl 0803.65146
[20] Meng, X.-L.; Rubin, D. B., Maximum Likelihood Estimation via the ECM Algorithm: A General Framework, Biometrika, 80, 267-278 (1993) · Zbl 0778.62022
[21] Meng, X.-L.; van Dyk, D., The EM Algorithm: An Old Folk-Song Sung to a Fast New Tune, Journal of the Royal Statistical Society, 59, 511-567 (1997) · Zbl 1090.62518
[22] Neal, R. M.; Hinton, G. E., A View of the EM Algorithm That Justifies Incremental, Sparse, and Other Variants, Learning in Graphical Models, 355-368 (1998), New York: Springer, New York · Zbl 0916.62019
[23] Neykov, N.; Filzmoser, P.; Dimova, R.; Neytchev, P., Robust Fitting of Mixtures Using the Trimmed Likelihood Estimator, Computational Statistics & Data Analysis, 52, 299-308 (2007) · Zbl 1328.62033
[24] Nowak, R., Distributed EM Algorithms for Density Estimation and Clustering in Sensor Networks, IEEE Transactions on Signal Processing, 51, 2245-2253 (2003)
[25] Perry, P. O., Fast Moment-Based Estimation for Hierarchical Models, Journal of the Royal Statistical Society, 79, 267-291 (2017) · Zbl 1414.62186
[26] R: A Language and Environment for Statistical Computing (2016), Vienna, Austria: R Foundation for Statistical Computing, Vienna, Austria
[27] Salakhutdinov, R.; Roweis, S., Adaptive Overrelaxed Bound Optimization Methods, ICML, 664-671 (2003)
[28] Suchard, M. A.; Wang, Q.; Chan, C.; Frelinger, J.; Cron, A. J.; West, M., Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures, Journal of Computational and Graphical Statistics, 19, 419-438 (2010)
[29] Titterington, D. M., Recursive Parameter Estimation Using Incomplete Data, Journal of the Royal Statistical Society, 46, 257-267 (1984) · Zbl 0556.62061
[30] van Dyk, D. A., Fitting Mixed-Effects Models Using Efficient EM-Type Algorithms, Journal of Computational and Graphical Statistics, 9, 78-98 (2000)
[31] Varadhan, R.; Roland, C., Simple and Globally Convergent Methods for Accelerating the Convergence of any EM Algorithm, Scandinavian Journal of Statistics, 35, 335-353 (2008) · Zbl 1164.65006
[32] Weng, Y.; Xiao, W.; Xie, L., Diffusion-Based EM Algorithm for Distributed Estimation of Gaussian Mixtures in Wireless Sensor Networks, Sensors, 11, 6297-6316 (2011)
[33] Wu, C., On the Convergence Properties of the EM Algorithm, The Annals of Statistics, 11, 95-103 (1983) · Zbl 0517.62035
[34] Yu, H., Rmpi: Parallel Statistical Computing in R, R News, 2, 10-14 (2002)
[35] Yu, Y., Monotonically Overrelaxed EM Algorithms, Journal of Computational and Graphical Statistics, 21, 518-537 (2012)
[36] Zhou, H.; Lange, K.; Suchard, M., Graphics Processing Units and High-Dimensional Optimization, Statistical Science, 25, 311-324 (2010) · Zbl 1329.62028
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.