Infinite mixtures of infinite factor analysers. (English) Zbl 1459.62118

Summary: Factor-analytic Gaussian mixtures are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be fixed in advance of model fitting. The pair which optimises some model selection criterion is then chosen. For computational reasons, having the number of factors differ across clusters is rarely considered.
Here the infinite mixture of infinite factor analysers (IMIFA) model is introduced. IMIFA employs a Pitman-Yor process prior to facilitate automatic inference of the number of clusters using the stick-breaking construction and a slice sampler. Automatic inference of the cluster-specific numbers of factors is achieved using multiplicative gamma process shrinkage priors and an adaptive Gibbs sampler. IMIFA is presented as the flagship of a family of factor-analytic mixtures.
Applications to benchmark data, metabolomic spectral data, and a handwritten digit example illustrate the IMIFA model’s advantageous features. These include obviating the need for model selection criteria, reducing the computational burden associated with the search of the model space, improving clustering performance by allowing cluster-specific numbers of factors, and uncertainty quantification.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
60G20 Generalized stochastic processes
Full Text: DOI arXiv Euclid


[1] Baek, J., McLachlan, G. J., and Flack, L. K. (2010). “Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7): 1298-1309.
[2] Bai, J. and Li, K. (2012). “Statistical analysis of factor models of high dimension.” The Annals of Statistics, 40(1): 436-465. · Zbl 1246.62144
[3] Bhattacharya, A. and Dunson, D. B. (2011). “Sparse Bayesian infinite factor models.” Biometrika, 98(2): 291-306. · Zbl 1215.62025
[4] Brooks, S. P. and Gelman, A. (1998). “Generative methods for monitoring convergence of iterative simulations.” Journal of Computational and Graphical Statistics, 7(4): 434-455.
[5] Carmody, S. and Brennan, L. (2010). “Effects of pentylenetetrazole-induced seizures on metabolomic profiles of rat brain.” Neurochemistry International, 56(2): 340-344.
[6] Carmona, C., Nieto-barajas, L., and Canale, A. (2019). “Model based approach for household clustering with mixed scale variables.” Advances in Data Analysis and Classification, 13(2): 559-583. · Zbl 1474.62439
[7] Carpaneto, G. and Toth, P. (1980). “Solution of the assignment problem.” ACM Transactions on Mathematical Software, 6(1): 104-111. · Zbl 0445.90089
[8] Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q., and West, M. (2008). “High-dimensional sparse factor modeling: applications in gene expression genomics.” Journal of the American Statistical Association, 103(484): 1438-1456. · Zbl 1286.62091
[9] Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D. B., and Carin, L. (2010). “Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: algorithm and performance bounds.” IEEE Transactions on Signal Processing, 58(12): 6140-6155. · Zbl 1392.94139
[10] De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prünster, I., and Ruggiero, M. (2015). “Are Gibbs-type priors the most natural generalization of the Dirichlet process?” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212-229.
[11] Diebolt, J. and Robert, C. P. (1994). “Estimation of finite mixture distributions through Bayesian sampling.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 56(2): 363-375. · Zbl 0796.62028
[12] Durante, D. (2017). “A note on the multiplicative gamma process.” Statistics & Probability Letters, 122: 198-204. · Zbl 1463.62160
[13] Ferguson, T. S. (1973). “A Bayesian analysis of some nonparametric problems.” The Annals of Statistics, 1(2): 209-230. · Zbl 0255.62037
[14] Fokoué, E. and Titterington, D. M. (2003). “Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation.” Machine Learning, 50(1): 73-94. · Zbl 1033.68085
[15] Forina, M., Armanino, C., Lanteri, S., and Tiscornia, E. (1983). “Classification of olive oils from their fatty acid composition.” In Martens, H. and Russrum Jr., H. (eds.), Food Research and Data Analysis, 189-214. Applied Science Publishers, London.
[16] Frühwirth-Schnatter, S. (2010). Finite mixture and Markov switching models. Series in Statistics. New York: Springer. · Zbl 1108.62002
[17] Frühwirth-Schnatter, S. (2011). “Dealing with label switching under model uncertainty.” In Mengersen, K. L., Robert, C. P., and Titterington, D. M. (eds.), Mixtures: Estimation and Applications, Wiley Series in Probability and Statistics, 193-218. Chichester: John Wiley & Sons.
[18] Frühwirth-Schnatter, S. and Lopes, H. F. (2010). “Parsimonious Bayesian factor analysis when the number of factors is unknown.” Technical report, The University of Chicago Booth School of Business.
[19] Frühwirth-Schnatter, S. and Malsiner-Walli, G. (2019). “From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering.” Advances in Data Analysis and Classification, 13(1): 33-63. · Zbl 1474.62225
[20] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2004). Bayesian data analysis. Chapman and Hall/CRC Press, third edition. · Zbl 1279.62004
[21] Ghahramani, Z. and Hinton, G. E. (1996). “The EM algorithm for mixtures of factor analyzers.” Technical report, Department of Computer Science, University of Toronto.
[22] Ghosh, J. and Dunson, D. B. (2008). “Default prior distributions and efficient posterior computation in Bayesian factor analysis.” Journal of Computational and Graphical Statistics, 18(2): 306-320.
[23] Green, P. J. and Richardson, S. (2001). “Modelling heterogeneity with and without the Dirichlet process.” Scandinavian Journal of Statistics, 28(2): 355-375. · Zbl 0973.62031
[24] Hastie, D. I., Liverani, S., and Richardson, S. (2014). “Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations.” Statistics and Computing, 25(5): 1023-1037. · Zbl 1332.62093
[25] Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of statistical learning. Springer Series in Statistics. New York: Springer, second edition. · Zbl 0973.62007
[26] Hubert, L. and Arabie, P. (1985). “Comparing partitions.” Journal of Classification, 2(1): 193-218. · Zbl 0587.62128
[27] Kalli, M., Griffin, J. E., and Walker, S. G. (2011). “Slice sampling mixture models.” Statistics and Computing, 21(1): 93-105. · Zbl 1256.65006
[28] Kass, R. E. and Raftery, A. E. (1995). “Bayes factors.” Journal of the American Statistical Association, 90(430): 773-795. · Zbl 0846.62028
[29] Kim, S., Tadesse, M. G., and Vannucci, M. (2006). “Variable selection in clustering via Dirichlet process mixture models.” Biometrika, 93(4): 877-893. · Zbl 1436.62266
[30] Knott, M. and Bartholomew, D. J. (1999). Latent variable models and factor analysis. Number 7 in Kendall’s library of statistics. London: Edward Arnold, second edition.
[31] Knowles, D. and Ghahramani, Z. (2007). “Infinite sparse factor analysis and infinite independent components analysis.” In Davies, M. E., James, C. J., Abdallah, S. A., and Plumbley, M. D. (eds.), Independent component analysis and signal separation, 381-388. Berlin, Heidelberg: Springer. · Zbl 1173.94367
[32] Knowles, D. and Ghahramani, Z. (2011). “Nonparametric Bayesian sparse factor models with application to gene expression modeling.” The Annals of Applied Statistics, 5(2B): 1534-1552. · Zbl 1223.62013
[33] Lee, J. and MacEachern, S. N. (2014). “Inference functions in high dimensional Bayesian inference.” Statistics and Its Interface, 7(4): 477-486. · Zbl 1388.62143
[34] Legramanti, S., Durante, D., and Dunson, D. B. (2019). “Bayesian cumulative shrinkage for infinite factorizations.” arXiv:1902.04349.
[35] McLachlan, G. J. and Peel, D. (2000). Finite mixture models. Wiley Series in Probability and Statistics. New York: John Wiley & Sons. · Zbl 0963.62061
[36] McNicholas, P. D. (2010). “Model-based classification using latent Gaussian mixture models.” Journal of Statistical Planning and Inference, 140(5): 1175-1181. · Zbl 1181.62095
[37] McNicholas, P. D., ElSherbiny, A., McDaid, A. F., and Murphy, T. B. (2018). pgmm: parsimonious Gaussian mixture models. R package version 1.2.3. URL https://cran.r-project.org/package=pgmm.
[38] McNicholas, P. D. and Murphy, T. B. (2008). “Parsimonious Gaussian mixture models.” Statistics and Computing, 18(3): 285-296.
[39] McParland, D., Gormley, I. C., McCormick, T. H., Clark, S. J., Kabudula, C. W., and Collinson, M. A. (2014). “Clustering South African households based on their asset status using latent variable models.” The Annals of Applied Statistics, 8(2): 747-767. · Zbl 1454.62503
[40] Miller, J. W. and Dunson, D. B. (2018). “Robust Bayesian inference via coarsening.” Journal of the American Statistical Association, 114(527): 1113-1125. · Zbl 1428.62287
[41] Miller, J. W. and Harrison, M. T. (2013). “A simple example of Dirichlet process mixture inconsistency for the number of components.” Advances in Neural Information Processing Systems, 26: 199-206.
[42] Miller, J. W. and Harrison, M. T. (2014). “Inconsistency of Pitman-Yor process mixtures for the number of components.” The Journal of Machine Learning Research, 15(1): 3333-3370. · Zbl 1319.62100
[43] Müller, P. and Mitra, R. (2013). “Bayesian nonparametric inference – why and how.” Bayesian Analysis, 8(2): 269-360. · Zbl 1329.62172
[44] Murphy, K., Viroli, C., and Gormley, I. C. (2019a). “Supplementary material: infinite mixtures of infinite factor analysers.” Bayesian Analysis.
[45] Murphy, K., Viroli, C., and Gormley, I. C. (2019b). IMIFA: infinite mixtures of infinite factor analysers and related models. R package version 2.1.0. URL https://cran.r-project.org/package=IMIFA.
[46] Ng, A. Y., Jordan, M. I., and Weiss, Y. (2001). “On spectral clustering: analysis and an algorithm.” In Advances in neural information processing systems, 849-856. Cambridge, MA, USA: MIT Press.
[47] Nyamundanda, G., Brennan, L., and Gormley, I. C. (2010). “Probabilistic principle component analysis for metabolomic data.” BMC Bioinformatics, 11(571): 1-11.
[48] Paisley, J. and Carin, L. (2009). “Nonparametric factor analysis with Beta process priors.” In Proceedings of the 26th annual international conference on machine learning, ICML ’09, 777-784. New York, NY, USA: ACM.
[49] Papaspiliopoulos, O. and Roberts, G. O. (2008). “Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models.” Biometrika, 95(1): 169-186. · Zbl 1437.62576
[50] Papastamoulis, P. (2018). “Overfitting Bayesian mixtures of factor analyzers with an unknown number of components.” Computational Statistics & Data Analysis, 124: 220-234. · Zbl 1469.62125
[51] Peel, D. and McLachlan, G. J. (2000). “Robust mixture modelling using the \(t\) distribution.” Statistics and Computing, 10: 339-348.
[52] Perman, M., Pitman, J., and Yor, M. (1992). “Size-biased sampling of Poisson point processes and excursions.” Probability Theory and Related Fields, 92(1): 21-39. · Zbl 0741.60037
[53] Pitman, J. (1996). “Random discrete distributions invariant under size-biased permutation.” Advances in Applied Probability, 28(2): 525-539. · Zbl 0853.62018
[54] Pitman, J. and Yor, M. (1997). “The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator.” The Annals of Probability, 25(2): 855-900. · Zbl 0880.60076
[55] Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). “CODA: convergence diagnosis and output analysis for MCMC.” R News, 6(1): 7-11.
[56] Raftery, A. E., Newton, M., Satagopan, J., and Krivitsky, P. (2007). “Estimating the integrated likelihood via posterior simulation using the harmonic mean identity.” In Bayesian statistics 8, 1-45. · Zbl 1252.62038
[57] R Core Team (2019). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
[58] Richardson, S. and Green, P. J. (1997). “On Bayesian analysis of mixtures with an unknown number of components (with discussion).” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(4): 731-792. · Zbl 0891.62020
[59] Rocková, V. and George, E. I. (2016). “Fast Bayesian factor analysis via automatic rotations to sparsity.” Journal of the American Statistical Association, 111(516): 1608-1622.
[60] Rodriguez, C. E. and Walker, S. G. (2014). “Univariate Bayesian nonparametric mixture modeling with unimodal kernels.” Statistics and Computing, 24(1): 35-49. · Zbl 1325.62016
[61] Rousseau, J. and Mengersen, K. (2011). “Asymptotic behaviour of the posterior distribution in overfitted mixture models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(5): 689-710. · Zbl 1228.62034
[62] Rue, H. and Held, L. (2005). Gaussian Markov random fields: theory and applications, volume 104 of Monographs on statistics and applied probability. London: Chapman and Hall/CRC Press. · Zbl 1093.60003
[63] Scrucca, L., Fop, M., Murphy, T. B., and Raftery, A. E. (2016). “mclust 5: clustering, classification and density estimation using Gaussian finite mixture models.” The R Journal, 8(1): 289-317.
[64] Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4): 583-639. · Zbl 1067.62010
[65] Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2014). “The deviance information criterion: 12 years on.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(3): 485-493. · Zbl 1411.62027
[66] Stephens, M. (2000). “Bayesian analysis of mixture models with an unknown number of components – an alternative to reversible jump methods.” The Annals of Statistics, 28(1): 40-74. · Zbl 1106.62316
[67] Tipping, M. E. and Bishop, C. M. (1999). “Mixtures of probabilistic principal component analyzers.” Neural Computation, 11(2): 443-482.
[68] van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., and van der Werf, M. J. (2006). “Centering, scaling, and transformations: improving the biological information content of metabolomics data.” BMC Genomics, 7(1): 142.
[69] van Havre, Z., White, N., Rousseau, J., and Mengersen, K. (2015). “Overfitting Bayesian mixture models with an unknown number of components.” PloS one, 10(7): e0131739.
[70] Viroli, C. (2010). “Dimensionally reduced model-based clustering through mixtures of factor mixture analyzers.” Journal of classification, 27(3): 363-388. · Zbl 1337.62141
[71] Viroli, C. (2011). “Finite mixtures of matrix normal distributions for classifying three-way data.” Statistics and Computing, 21(4): 511-522. · Zbl 1221.62083
[72] Walker, S. G. (2007). “Sampling the Dirichlet mixture model with slices.” Communications in Statistics - Simulation and Computation, 36(1): 45-54. · Zbl 1113.62058
[73] Wang, C., Pan, G., Tong, T., and L, Z. (2015). “Shrinkage estimation of large dimensional precision matrix using random matrix theory.” Statistica Sinica, 25(3): 993-1008. · Zbl 1415.62035
[74] Wang, Y., Canale, A., and Dunson, D. B. (2016). “Scalable geometric density estimation.” In Gretton, A. and Robert, C. P. (eds.), Proceedings of the 19th international conference on artificial intelligence and statistics, volume 51 of Proceedings of Machine Learning Research, 857-865. Cadiz, Spain: PMLR.
[75] West, M. (2003). “Bayesian factor regression models in the “large p, small n” paradigm.” In Bayesian statistics 7, 723-732. Oxford University Press.
[76] West, M., Müller, P., and Escobar, M. D. (1994). “Hierarchical priors and mixture models, with applications in regression and density estimation.” In Smith, A. F. M. and Freeman, P. R. (eds.), Aspects of uncertainty: a tribute to D. V. Lindley, 363-386. New York: John Wiley & Sons. · Zbl 0842.62001
[77] Xing, E. P., Sohn, K. A., Jordan, M. I., and Teh, Y. W. (2006). “Bayesian multi-population haplotype inference via a hierarchical Dirichlet process mixture.” In Proceedings of the 23rd International Conference on Machine Learning, 1049-1056. ACM.
[78] Yellott, J. I., Jr. (1977). “The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment, and the double exponential distribution.” Journal of Mathematical Psychology, 15(2): 109-144. · Zbl 0362.92024
[79] Yerebakan, H.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.