×

Mixture model averaging for clustering. (English) Zbl 1414.62283

Summary: In mixture model-based clustering applications, it is common to fit several models from a family and report clustering results from only the ’best’ one. In such circumstances, selection of this best model is achieved using a model selection criterion, most often the Bayesian information criterion. Rather than throw away all but the best model, we average multiple models that are in some sense close to the best one, thereby producing a weighted average of clustering results. Two (weighted) averaging approaches are considered: averaging component membership probabilities and averaging models. In both cases, Occam’s window is used to determine closeness to the best model and weights are computed within a Bayesian model averaging paradigm. In some cases, we need to merge components before averaging; we introduce a method for merging mixture components based on the adjusted Rand index. The effectiveness of our model-based clustering averaging approaches is illustrated using a family of Gaussian mixture models on real and simulated data.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Anderson E (1935) The irises of the Gaspé peninsula. Bull Am Iris Soc 59:2-5
[2] Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361-373 · Zbl 1255.62175
[3] Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate t-distributions. Comput Stat Data Anal 55(1):520-529 · Zbl 1247.62151
[4] Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803-821 · Zbl 0794.62034
[5] Basford KE, McLachlan GJ (1985) Estimation of allocation rates in a cluster analysis context. J Am Stat Assoc 80(390):286-293
[6] Baudry J-P, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19(2):332-353
[7] Bhattacharya S, McNicholas PD (2014) A LASSO-penalized BIC for mixture model selection. Adv Data Anal Classif 8(1):45-61 · Zbl 1474.62212
[8] Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719-725
[9] Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502-519 · Zbl 1452.62433
[10] Browne RP, McNicholas PD (2013) Mixture: mixture models for clustering and classification. R package version 1.0 · Zbl 1332.62215
[11] Browne RP, McNicholas PD (2014) Estimating common principal components in high dimensions. Adv Data Anal Classif 8(2):217-226 · Zbl 1474.62183
[12] Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781-793
[13] Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do K-A, Müller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, New York
[14] Dasgupta A, Raftery AE (1998) Detecting features in spatial point processes with clutter via model-based clustering. J Am Stat Assoc 93:294-302 · Zbl 0906.62105
[15] Dean N, Murphy TB, Downey G (2006) Using unlabelled data to update classification rules with applications in food authenticity studies. J R Stat Soc: Ser C 55(1):1-14 · Zbl 1490.62155
[16] Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc: Ser B 39(1):1-38 · Zbl 0364.62022
[17] Faraway J (2011) Faraway: functions and datasets for books by Julian Faraway. R package version 1.0.5
[18] Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179-188
[19] Flury B (1997) A first course in multivariate statistics. Springer, New York · Zbl 0879.62052
[20] Flury B (2012) Flury: data sets from flury, 1997. R package version 0.1-3
[21] Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189-201
[22] Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington, Seattle, WA · Zbl 1520.62002
[23] Fraley C, Raftery AE, Scrucca L (2013) mclust: normal mixture modeling for model-based clustering, classification, and density estimation. R package version 4.2 · Zbl 1520.62002
[24] Franczak BC, Browne RP, McNicholas PD (2014) Mixtures of shifted asymmetric Laplace distributions. IEEE Trans Pattern Anal Mach Intell 36(6):1149-1157
[25] Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27:835-850
[26] Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc: Ser B 58:155-176 · Zbl 0850.62476
[27] Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Classif 4:3-34 · Zbl 1306.62141
[28] Hjort NL, Claeskens G (2003) Frequentist model average estimators. J Am Stat Assoc 98(464):879-899 · Zbl 1047.62003
[29] Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: A tutorial. Stat Sci 14(4):382-401 · Zbl 1059.62525
[30] Hoeting JA, Raftery AE, Madigan D (1999) Bayesian simultaneous variable and transformation selection in linear regression. Technical Report 9905, Department of Statistics, Colorado State University · Zbl 0900.62352
[31] Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193-218
[32] Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30-37
[33] Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773-795 · Zbl 0846.62028
[34] Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā Indian J Stat Ser A 62(1):49-66 · Zbl 1081.62516
[35] Krivitsky PN, Handcock MS, Raftery AE, Hoff PD (2009) Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc Netw 31(3):204-213
[36] Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 1992:1350-1360 · Zbl 0763.62015
[37] Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535-1546 · Zbl 0814.62030
[38] Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43:570-577 · Zbl 0857.90073
[39] MATLAB (2011). version 7.12.0.635 (R2011a). Natick, Massachusetts: The MathWorks Inc.
[40] McNicholas PD (2010) Model-based classification using latent Gaussian mixture models. J Stat Plan Inference 140(5):1175-1181 · Zbl 1181.62095
[41] McNicholas PD, Browne RP (2013) Discussion of How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J R Stat Soc: Ser C 62(3):352-353
[42] McNicholas PD, Jampani KR, McDaid AF, Murphy TB, Banks L (2014) pgmm: Parsimonious Gaussian Mixture Models. R package version 1.1
[43] McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285-296
[44] McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705-2712
[45] Milligan GW, Cooper MC (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivar Behav Res 21(4):441-458
[46] Molitor J, Papathomas M, Jerrett M, Richardson S (2010) Bayesian profile regression with an application to the national survey of children’s health. Biostatistics 11(3):484-498 · Zbl 1437.62560
[47] Murray PM, Browne RB, McNicholas PD (2014) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326-335 · Zbl 1506.62132
[48] Qiu W, Joe H (2006) Generation of random clusters with specified degree of separation. J Classif 23:315-334 · Zbl 1336.62189
[49] Qiu W, Joe H (2012) ClusterGeneration: random cluster generation (with specified degree of separation). R package version 1.2.9
[50] R Core Team (2013) R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria
[51] Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251-266 · Zbl 0864.62049
[52] Raftery AE, Madigan D, Hoeting JA (1998) Bayesian model averaging for linear regression models. J Am Stat Assoc 92:179-191 · Zbl 0888.62026
[53] Raftery, AE; Madigan, D.; Volinsky, CT; Bernardo, JM (ed.); Berger, JO (ed.); Dawid, AP (ed.); Smith, AFM (ed.), Accounting for model uncertainty in survival analysis improves predictive performance (with discussion), No. 5, 323-349 (1995), Oxford
[54] Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846-850
[55] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461-464 · Zbl 0379.62005
[56] Steinley D (2004) Properties of the Hubert-Arabie adjusted Rand index. Psychol Methods 9:386-396
[57] Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc: Ser B 62:795-809 · Zbl 0957.62020
[58] Strehl A, Ghosh J, Cardie C (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583-617 · Zbl 1084.68759
[59] Volinsky CT, Madigan D, Raftery AE, Kronmal RA (1997) Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke. J R Stat Soc: Ser C 46(4):433-448 · Zbl 0903.62093
[60] Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196-210 · Zbl 1471.62202
[61] Wehrens R, Buydens LM, Fraley C, Raftery AE (2004) Model-based clustering for image segmentation and large datasets via sampling. J Classif 21:231-253 · Zbl 1083.62051
[62] Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley
[63] Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977-987
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.