From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering. (English) Zbl 1474.62225

Summary: In model-based clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by G. Malsiner-Walli et al. [Stat. Comput. 26, No. 1–2, 303–324 (2016; Zbl 1342.62109)] are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with \(K\) components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than \(K\) with high probability. The number of clusters is then inferred a posteriori from the data. The present paper makes the following contributions in the context of sparse finite mixture modelling. First, it is illustrated that the concept of sparse finite mixture is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non-Gaussian clusters. Second, sparse finite mixtures are compared to Dirichlet process mixtures with respect to their ability to identify the number of clusters. For both model classes, a random hyper prior is considered for the parameters determining the weight distribution. By suitable matching of these priors, it is shown that the choice of this hyper prior is far more influential on the cluster solution than whether a sparse finite mixture or a Dirichlet process mixture is taken into consideration.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
60G12 General second-order stochastic processes


Zbl 1342.62109


poLCA; BayesLogit
Full Text: DOI arXiv


[1] Aitkin, M., A general maximum likelihood analysis of overdispersion in generalized linear models, Stat Comput, 6, 251-262, (1996)
[2] Azzalini, A., A class of distributions which includes the normal ones, Scand J Stat, 12, 171-178, (1985) · Zbl 0581.62014
[3] Azzalini, A., Further results on a class of distributions which includes the normal ones, Statistica, 46, 199-208, (1986) · Zbl 0606.62013
[4] Azzalini, A.; Capitanio, A., Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution, J R Stat Soc Ser B, 65, 367-389, (2003) · Zbl 1065.62094
[5] Azzalini, A.; Dalla Valle, A., The multivariate skew normal distribution, Biometrika, 83, 715-726, (1996) · Zbl 0885.62062
[6] Banfield, JD; Raftery, AE, Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034
[7] Bennett, DA; Schneider, JA; Buchman, AS; Leon, CM; Bienias, JL; Wilson, RS, The rush memory and aging project: study design and baseline characteristics of the study cohort, Neuroepidemiology, 25, 163-175, (2005)
[8] Bensmail, H.; Celeux, G.; Raftery, AE; Robert, CP, Inference in model-based cluster analysis, Stat Comput, 7, 1-10, (1997)
[9] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans Pattern Anal Mach Intell, 22, 719-725, (2000)
[10] Celeux, G.; Forbes, F.; Robert, CP; Titterington, DM, Deviance information criteria for missing data models, Bayesian Anal, 1, 651-674, (2006) · Zbl 1331.62329
[11] Celeux, G.; Frühwirth-Schnatter, S.; Robert, CP; Frühwirth-Schnatter, S. (ed.); Celeux, G. (ed.); Robert, CP (ed.), Model selection for mixture models—perspectives and strategies, 121-160, (2018), Boca Raton
[12] Clogg, CC; Goodman, LA, Latent structure analysis of a set of multidimensional contincency tables, J Am Stat Assoc, 79, 762-771, (1984) · Zbl 0547.62037
[13] Dellaportas, P.; Papageorgiou, I., Multivariate mixtures of normals with unknown number of components, Stat Comput, 16, 57-68, (2006)
[14] Escobar, MD; West, M., Bayesian density estimation and inference using mixtures, J Am Stat Assoc, 90, 577-588, (1995) · Zbl 0826.62021
[15] Escobar, MD; West, M.; Dey, D. (ed.); Müller, P. (ed.); Sinha, D. (ed.), Computing nonparametric hierarchical models, 1-22, (1998), Berlin · Zbl 0918.62028
[16] Fall MD, Barat É (2014) Gibbs sampling methods for Pitman-Yor mixture models. Working paper https://hal.archives-ouvertes.fr/hal-00740770/file/Fall-Barat.pdf
[17] Ferguson, TS, A Bayesian analysis of some nonparametric problems, Ann Stat, 1, 209-230, (1973) · Zbl 0255.62037
[18] Ferguson, TS, Prior distributions on spaces of probability measures, Ann Stat, 2, 615-629, (1974) · Zbl 0286.62008
[19] Ferguson, TS; Rizvi, MH (ed.); Rustagi, JS (ed.), Bayesian density estimation by mixtures of normal distributions, 287-302, (1983), New York
[20] Frühwirth-Schnatter, S., Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques, Econom J, 7, 143-167, (2004) · Zbl 1053.62087
[21] Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York · Zbl 1108.62002
[22] Frühwirth-Schnatter, S.; Mengersen, K. (ed.); Robert, CP (ed.); Titterington, D. (ed.), Dealing with label switching under model uncertainty, 213-239, (2011), Chichester
[23] Frühwirth-Schnatter, S.; Mengersen, K. (ed.); Robert, CP (ed.); Titterington, D. (ed.), Label switching under model uncertainty, 213-239, (2011), Hoboken
[24] Frühwirth-Schnatter, S.; Pyne, S., Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-\(t\) distributions, Biostatistics, 11, 317-336, (2010)
[25] Frühwirth-Schnatter, S.; Wagner, H., Marginal likelihoods for non-Gaussian models using auxiliary mixture sampling, Comput Stat Data Anal, 52, 4608-4624, (2008) · Zbl 1452.62060
[26] Frühwirth-Schnatter, S.; Frühwirth, R.; Held, L.; Rue, H., Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data, Stat Comput, 19, 479-492, (2009)
[27] Frühwirth-Schnatter S, Celeux G, Robert CP (eds) (2018) Handbook of mixture analysis. CRC Press, Boca Raton
[28] Goodman, LA, Exploratory latent structure analysis using both identifiable and unidentifiable models, Biometrika, 61, 215-231, (1974) · Zbl 0281.62057
[29] Green, PJ; Richardson, S., Modelling heterogeneity with and without the Dirichlet process, Scand J Stat, 28, 355-375, (2001) · Zbl 0973.62031
[30] Grün, B.; Frühwirth-Schnatter, S. (ed.); Celeux, G. (ed.); Robert, CP (ed.), Model-based clustering, 163-198, (2018), Boca Raton
[31] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218, (1985) · Zbl 0587.62128
[32] Ishwaran, H.; James, LF, Gibbs sampling methods for stick-breaking priors, J Am Stat Assoc, 96, 161-173, (2001) · Zbl 1014.62006
[33] Kalli, M.; Griffin, JE; Walker, SG, Slice sampling mixture models, Stat Comput, 21, 93-105, (2011) · Zbl 1256.65006
[34] Keribin, C., Consistent estimation of the order of mixture models, Sankhyā A, 62, 49-66, (2000) · Zbl 1081.62516
[35] Lau, JW; Green, P., Bayesian model-based clustering procedures, J Comput Graph Stat, 16, 526-558, (2007)
[36] Lazarsfeld PF, Henry NW (1968) Latent structure analysis. Houghton Mifflin, New York · Zbl 0182.52201
[37] Lee, S.; McLachlan, GJ, Model-based clustering and classification with non-normal mixture distributions, Stat Methods Appl, 22, 427-454, (2013) · Zbl 1332.62209
[38] Linzer, DA; Lewis, JB, polca: an R package for polytomous variable latent class analysis, J Stat Softw, 42, 1-29, (2011)
[39] Malsiner Walli, G.; Frühwirth-Schnatter, S.; Grün, B., Model-based clustering based on sparse finite Gaussian mixtures, Stat Comput, 26, 303-324, (2016) · Zbl 1342.62109
[40] Malsiner Walli, G.; Frühwirth-Schnatter, S.; Grün, B., Identifying mixtures of mixtures using Bayesian estimation, J Comput Graph Stat, 26, 285-295, (2017) · Zbl 1342.62109
[41] Malsiner-Walli, G.; Pauger, D.; Wagner, H., Effect fusion using model-based clustering, Stat Model, 18, 175-196, (2018)
[42] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics. Wiley, New York · Zbl 0963.62061
[43] Medvedovic, M.; Yeung, KY; Bumgarner, RE, Bayesian mixture model based clustering of replicated microarray data, Bioinformatics, 20, 1222-1232, (2004)
[44] Miller JW, Harrison MT (2013) A simple example of Dirichlet process mixture inconsistency for the number of components. In: Advances in neural information processing systems, pp 199-206
[45] Miller, JW; Harrison, MT, Mixture models with a prior on the number of components, J Am Stat Assoc, 113, 340-356, (2018) · Zbl 1398.62066
[46] Müller, P.; Mitra, R., Bayesian nonparametric inference—why and how, Bayesian Anal, 8, 269-360, (2013) · Zbl 1329.62171
[47] Nobile, A., On the posterior distribution of the number of components in a finite mixture, Ann Stat, 32, 2044-2073, (2004) · Zbl 1056.62037
[48] Papaspiliopoulos, O.; Roberts, G., Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models, Biometrika, 95, 169-186, (2008) · Zbl 1437.62576
[49] Polson, NG; Scott, JG; Windle, J., Bayesian inference for logistic models using Pólya-Gamma latent variables, J Am Stat Assoc, 108, 1339-49, (2013) · Zbl 1283.62055
[50] Quintana, FA; Iglesias, PL, Bayesian clustering and product partition models, J R Stat Soc Ser B, 65, 557-574, (2003) · Zbl 1065.62115
[51] Richardson, S.; Green, PJ, On Bayesian analysis of mixtures with an unknown number of components, J R Stat Soc Ser B, 59, 731-792, (1997) · Zbl 0891.62020
[52] Rousseau, J.; Mengersen, K., Asymptotic behaviour of the posterior distribution in overfitted mixture models, J R Stat Soc Ser B, 73, 689-710, (2011) · Zbl 1228.62034
[53] Sethuraman, J., A constructive definition of Dirichlet priors, Stat Sin, 4, 639-650, (1994) · Zbl 0823.62007
[54] Stern, H.; Arcus, D.; Kagan, J.; Rubin, DB; Snidman, N., Statistical choices in infant temperament research, Behaviormetrika, 21, 1-17, (1994)
[55] van Havre Z, White N, Rousseau J, Mengersen K (2015) Overfitting Bayesian mixture models with an unknown number of components. PLoS ONE 10(7):e0131739, 1-27
[56] Viallefont, V.; Richardson, S.; Green, PJ, Bayesian analysis of Poisson mixtures, J Nonparametr Stat, 14, 181-202, (2002) · Zbl 1014.62035
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.