Mixture model averaging for clustering. (English) Zbl 1414.62283

Summary: In mixture model-based clustering applications, it is common to fit several models from a family and report clustering results from only the ’best’ one. In such circumstances, selection of this best model is achieved using a model selection criterion, most often the Bayesian information criterion. Rather than throw away all but the best model, we average multiple models that are in some sense close to the best one, thereby producing a weighted average of clustering results. Two (weighted) averaging approaches are considered: averaging component membership probabilities and averaging models. In both cases, Occam’s window is used to determine closeness to the best model and weights are computed within a Bayesian model averaging paradigm. In some cases, we need to merge components before averaging; we introduce a method for merging mixture components based on the adjusted Rand index. The effectiveness of our model-based clustering averaging approaches is illustrated using a family of Gaussian mixture models on real and simulated data.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI arXiv


[1] Anderson, E., The irises of the Gaspé peninsula, Bull Am Iris Soc, 59, 2-5, (1935)
[2] Andrews, JL; McNicholas, PD, Extending mixtures of multivariate t-factor analyzers, Stat Comput, 21, 361-373, (2011) · Zbl 1255.62175
[3] Andrews, JL; McNicholas, PD; Subedi, S., Model-based classification via mixtures of multivariate t-distributions, Comput Stat Data Anal, 55, 520-529, (2011) · Zbl 1247.62151
[4] Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803-821 · Zbl 0794.62034
[5] Basford, KE; McLachlan, GJ, Estimation of allocation rates in a cluster analysis context, J Am Stat Assoc, 80, 286-293, (1985)
[6] Baudry, J-P; Raftery, AE; Celeux, G.; Lo, K.; Gottardo, R., Combining mixture components for clustering, J Comput Graph Stat, 19, 332-353, (2010)
[7] Bhattacharya, S.; McNicholas, PD, A LASSO-penalized BIC for mixture model selection, Adv Data Anal Classif, 8, 45-61, (2014)
[8] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans Pattern Anal Mach Intell, 22, 719-725, (2000)
[9] Bouveyron, C.; Girard, S.; Schmid, C., High-dimensional data clustering, Comput Stat Data Anal, 52, 502-519, (2007) · Zbl 1452.62433
[10] Browne RP, McNicholas PD (2013) Mixture: mixture models for clustering and classification. R package version 1.0
[11] Browne, RP; McNicholas, PD, Estimating common principal components in high dimensions, Adv Data Anal Classif, 8, 217-226, (2014)
[12] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognit, 28, 781-793, (1995)
[13] Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do K-A, Müller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, New York
[14] Dasgupta, A.; Raftery, AE, Detecting features in spatial point processes with clutter via model-based clustering, J Am Stat Assoc, 93, 294-302, (1998) · Zbl 0906.62105
[15] Dean, N.; Murphy, TB; Downey, G., Using unlabelled data to update classification rules with applications in food authenticity studies, J R Stat Soc: Ser C, 55, 1-14, (2006) · Zbl 05188723
[16] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc: Ser B, 39, 1-38, (1977) · Zbl 0364.62022
[17] Faraway J (2011) Faraway: functions and datasets for books by Julian Faraway. R package version 1.0.5
[18] Fisher, RA, The use of multiple measurements in taxonomic problems, Ann Eugen, 7, 179-188, (1936)
[19] Flury B (1997) A first course in multivariate statistics. Springer, New York · Zbl 0879.62052
[20] Flury B (2012) Flury: data sets from flury, 1997. R package version 0.1-3
[21] Forina, M.; Armanino, C.; Castino, M.; Ubigli, M., Multivariate data analysis as a discriminating method of the origin of wines, Vitis, 25, 189-201, (1986)
[22] Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington, Seattle, WA
[23] Fraley C, Raftery AE, Scrucca L (2013) mclust: normal mixture modeling for model-based clustering, classification, and density estimation. R package version 4.2
[24] Franczak, BC; Browne, RP; McNicholas, PD, Mixtures of shifted asymmetric Laplace distributions, IEEE Trans Pattern Anal Mach Intell, 36, 1149-1157, (2014)
[25] Fred, ALN; Jain, AK, Combining multiple clusterings using evidence accumulation, IEEE Trans Pattern Anal Mach Intell, 27, 835-850, (2005)
[26] Hastie, T.; Tibshirani, R., Discriminant analysis by Gaussian mixtures, J R Stat Soc: Ser B, 58, 155-176, (1996) · Zbl 0850.62476
[27] Hennig, C., Methods for merging Gaussian mixture components, Adv Data Anal Classif, 4, 3-34, (2010) · Zbl 1306.62141
[28] Hjort, NL; Claeskens, G., Frequentist model average estimators, J Am Stat Assoc, 98, 879-899, (2003) · Zbl 1047.62003
[29] Hoeting, JA; Madigan, D.; Raftery, AE; Volinsky, CT, Bayesian model averaging: A tutorial, Stat Sci, 14, 382-401, (1999) · Zbl 1059.62525
[30] Hoeting JA, Raftery AE, Madigan D (1999) Bayesian simultaneous variable and transformation selection in linear regression. Technical Report 9905, Department of Statistics, Colorado State University
[31] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218, (1985)
[32] Hunter, DR; Lange, K., A tutorial on MM algorithms, Am Stat, 58, 30-37, (2004)
[33] Kass, RE; Raftery, AE, Bayes factors, J Am Stat Assoc, 90, 773-795, (1995) · Zbl 0846.62028
[34] Keribin, C., Consistent estimation of the order of mixture models, Sankhyā Indian J Stat Ser A, 62, 49-66, (2000) · Zbl 1081.62516
[35] Krivitsky, PN; Handcock, MS; Raftery, AE; Hoff, PD, Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models, Soc Netw, 31, 204-213, (2009)
[36] Leroux, BG, Consistent estimation of a mixing distribution, Ann Stat, 1992, 1350-1360, (1992) · Zbl 0763.62015
[37] Madigan, D.; Raftery, AE, Model selection and accounting for model uncertainty in graphical models using Occam’s window, J Am Stat Assoc, 89, 1535-1546, (1994) · Zbl 0814.62030
[38] Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43:570-577 · Zbl 0857.90073
[39] MATLAB (2011). version (R2011a). Natick, Massachusetts: The MathWorks Inc.
[40] McNicholas, PD, Model-based classification using latent Gaussian mixture models, J Stat Plan Inference, 140, 1175-1181, (2010) · Zbl 1181.62095
[41] McNicholas, PD; Browne, RP, Discussion of How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification, J R Stat Soc: Ser C, 62, 352-353, (2013)
[42] McNicholas PD, Jampani KR, McDaid AF, Murphy TB, Banks L (2014) pgmm: Parsimonious Gaussian Mixture Models. R package version 1.1
[43] McNicholas, PD; Murphy, TB, Parsimonious Gaussian mixture models, Stat Comput, 18, 285-296, (2008)
[44] McNicholas, PD; Murphy, TB, Model-based clustering of microarray expression data via latent Gaussian mixture models, Bioinformatics, 26, 2705-2712, (2010)
[45] Milligan, GW; Cooper, MC, A study of the comparability of external criteria for hierarchical cluster analysis, Multivar Behav Res, 21, 441-458, (1986)
[46] Molitor, J.; Papathomas, M.; Jerrett, M.; Richardson, S., Bayesian profile regression with an application to the national survey of children’s health, Biostatistics, 11, 484-498, (2010)
[47] Murray, PM; Browne, RB; McNicholas, PD, Mixtures of skew-t factor analyzers, Comput Stat Data Anal, 77, 326-335, (2014) · Zbl 06984029
[48] Qiu, W.; Joe, H., Generation of random clusters with specified degree of separation, J Classif, 23, 315-334, (2006) · Zbl 1336.62189
[49] Qiu W, Joe H (2012) ClusterGeneration: random cluster generation (with specified degree of separation). R package version 1.2.9
[50] R Core Team (2013) R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria
[51] Raftery, AE, Approximate Bayes factors and accounting for model uncertainty in generalised linear models, Biometrika, 83, 251-266, (1996) · Zbl 0864.62049
[52] Raftery, AE; Madigan, D.; Hoeting, JA, Bayesian model averaging for linear regression models, J Am Stat Assoc, 92, 179-191, (1998) · Zbl 0888.62026
[53] Raftery, AE; Madigan, D.; Volinsky, CT; Bernardo, JM (ed.); Berger, JO (ed.); Dawid, AP (ed.); Smith, AFM (ed.), Accounting for model uncertainty in survival analysis improves predictive performance (with discussion), No. 5, 323-349, (1995), Oxford
[54] Rand, WM, Objective criteria for the evaluation of clustering methods, J Am Stat Assoc, 66, 846-850, (1971)
[55] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[56] Steinley, D., Properties of the Hubert-Arabie adjusted Rand index, Psychol Methods, 9, 386-396, (2004)
[57] Stephens, M., Dealing with label switching in mixture models, J R Stat Soc: Ser B, 62, 795-809, (2000) · Zbl 0957.62020
[58] Strehl, A.; Ghosh, J.; Cardie, C., Cluster ensembles—a knowledge reuse framework for combining multiple partitions, J Mach Learn Res, 3, 583-617, (2002) · Zbl 1084.68759
[59] Volinsky, CT; Madigan, D.; Raftery, AE; Kronmal, RA, Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke, J R Stat Soc: Ser C, 46, 433-448, (1997) · Zbl 0903.62093
[60] Vrbik, I.; McNicholas, PD, Parsimonious skew mixture models for model-based clustering and classification, Comput Stat Data Anal, 71, 196-210, (2014) · Zbl 1471.62202
[61] Wehrens, R.; Buydens, LM; Fraley, C.; Raftery, AE, Model-based clustering for image segmentation and large datasets via sampling, J Classif, 21, 231-253, (2004) · Zbl 1083.62051
[62] Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley
[63] Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977-987
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.