Clustering and classification via cluster-weighted factor analyzers. (English) Zbl 1271.62137

Summary: In model-based clustering and classification, the cluster-weighted model is a convenient approach when the random vector of interest is constituted by a response variable \(Y\) and by a vector \(X\) of \(p\) covariates. However, its applicability may be limited when \(p\) is high. To overcome this problem, this paper assumes a latent factor structure for \(X\) in each mixture component, under Gaussian assumptions. This leads to the cluster-weighted factor analyzers (CWFA) model. By imposing constraints on the variance of \(Y\) and the covariance matrix of \(X\), a novel family of sixteen CWFA models is introduced for model-based clustering and classification. The alternating expectation-conditional maximization algorithm, for maximum likelihood estimation of the parameters of all models in the family, is described; to initialize the algorithm, a 5-step hierarchical procedure is proposed, which uses the nested structures of the models within the family and thus guarantees the natural ranking among the sixteen likelihoods. Artificial and real data show that these models have very good clustering and classification performance and that the algorithm is able to recover the parameters very well.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis


Flury; flexmix; PGMM; mclust; R
Full Text: DOI arXiv


[1] Airoldi, J, Hoffmann R (1984) Age variation in voles (Microtus californicus, M. ochrogaster) and its significance for systematic studies, Occasional papers of the Museum of Natural History, vol 111. University of Kansas, Lawrence
[2] Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc Royal Soc Edinburgh 46:289–305 · JFM 52.0098.05
[3] Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate t-distributions. Comput Stat Data Anal 55(1):520–529 · Zbl 1247.62151
[4] Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32:1298–1309
[5] Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821 · Zbl 0794.62034
[6] Bartholomew DJ, Knott M (1999) Latent variable models and factor analysis. In: Kendall’s library of statistics, vol 7, 2nd edn. Edward Arnold, London · Zbl 1066.62528
[7] Bartlett M (1953) Factor analysis in psychology as a statistician sees it. In: Uppsala symposium on psychological factor analysis, Number 3 in Nordisk Psykologi’s Monograph Series, Uppsala, Sweden, pp 23–34. Almquist and Wiksell, Uppsala
[8] Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
[9] Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay B (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals Inst Stat Math 46(2):373–388 · Zbl 0802.62017
[10] Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519 · Zbl 1452.62433
[11] Browne RP, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis of data with mixed type. J Stat Plann Infer 142(11):2976–2984 · Zbl 1335.62093
[12] Browne RP, McNicholas PD, Sparling MD (2012) Model-based learning using a mixture of mixtures of Gaussian and uniform distributions. IEEE Trans Pattern Anal Mach Intell 34(4):814–817
[13] Carvalho C, Chang J, Lucas J, Nevins J, Wang Q, West M (2008) High-dimensional sparse factor modeling: applications in gene expression genomics. J Am Stat Assoc 103(484):1438–1456 · Zbl 1286.62091
[14] Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793 · Zbl 05480211
[15] Dean N, Murphy TB, Downey G (2006) Using unlabelled data to update classification rules with applications in food authenticity studies. J Royal Stat Soc Ser C 55(1):1–14 · Zbl 05188723
[16] Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B 39(1):1–38 · Zbl 0364.62022
[17] DeSarbo W, Cron W (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5(2):249–282 · Zbl 0692.62052
[18] Flury B (1997) A first course in multivariate statistics. Springer, New York · Zbl 0879.62052
[19] Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631 · Zbl 1073.62545
[20] Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical report 597, Department of Statistics, University of Washington, Seattle, Washington, USA
[21] Frühwirth-Schnatter S (2006) Finite mixture and markov switching models. Springer, New York · Zbl 1108.62002
[22] Gershenfeld N (1997) Nonlinear inference and cluster-weighted modeling. Ann New York Acad Sci 808(1):18–24
[23] Ghahramani Z, Hinton G (1997) The EM algorithm for factor analyzers. Technical report CRG-TR-96-1, University of Toronto, Toronto
[24] Grün B, Leisch F (2008) Flexmix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35
[25] Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296 · Zbl 1017.62058
[26] Hosmer D Jr (1973) A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample. Biometrics 29(4):761–770
[27] Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218 · Zbl 0587.62128
[28] Ingrassia S, Minotti SC, Punzo A (2013) Model-based clustering via linear cluster-weighted models. DOI: 10.1016/j.csda.2013.02.012 Computational Statistics and Data Analysis · Zbl 1471.62095
[29] Ingrassia, S, Minotti SC, Punzo A, Vittadini G (2012a) Generalized linear cluster-weighted models. eprint arXiv: 1211.1171, http://arxiv.org/abs/1211.1171 · Zbl 1331.62310
[30] Ingrassia S, Minotti SC, Vittadini G (2012b) Local statistical modeling via the cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401 · Zbl 1360.62335
[31] Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19(1):73–83
[32] Leisch F (2004) Flexmix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11(8):1–18
[33] Lin T-I (2010) Robust mixture modeling using multivariate skew t distributions. Stat Comput 20:343–356
[34] Lindsay BG (1995) Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics, vol. 5. Institute of Mathematical Statistics, Hayward
[35] McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York · Zbl 0697.62050
[36] McLachlan GJ, Peel D (2000a) Finite mixture models. Wiley, New York · Zbl 0963.62061
[37] McLachlan GJ, D Peel (2000b) Mixtures of factor analyzers. In: Proceedings of the seventh international conference on machine learning, pp 599–606. Morgan Kaufmann, San Francisco.
[38] McNicholas PD (2010) Model-based classification using latent Gaussian mixture models. J Stat Plann Infer 140(5):1175–1181 · Zbl 1181.62095
[39] McNicholas PD, Jampani KR, McDaid AF, Murphy TB, Banks L (2011) PGMM: Parsimonious Gaussian Mixture Models. R package version 1.0.
[40] McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296
[41] McNicholas PD, Murphy TB (2010a) Model-based clustering of longitudinal data. Can J Stat 38(1):153–168 · Zbl 1190.62120
[42] McNicholas PD, Murphy TB (2010b) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705–2712
[43] McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723 · Zbl 1464.62131
[44] McNicholas PD, Subedi S (2012) Clustering gene expression time course data using mixtures of multivariate t-distributions. J Stat Plann Infer 142(5):1114–1127 · Zbl 1236.62068
[45] Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278 · Zbl 0778.62022
[46] Meng XL, van Dyk D (1997) The EM algorithm: an old folk-song sung to a fast new tune. J Royal Stat Soc Ser B (Stat Methodol) 59(3):511–567 · Zbl 1090.62518
[47] Montanari A, Viroli C (2010) Heteroscedastic factor mixture analysis. Stat Modell 10(4):441–460 · Zbl 1283.62125
[48] Montanari A, Viroli C (2011) Dimensionally reduced mixtures of regression models. J Stat Plann Infer 141(5):1744–1752 · Zbl 1207.62145
[49] Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in C: the art of scientific computation, 2nd edn. Cambridge University Press, Cambridge
[50] Punzo, A (2012) Flexible mixture modeling with the polynomial Gaussian cluster-weighted model. eprint arXiv: 1207.0939, http://arxiv.org/abs/1207.0939
[51] Rand W (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
[52] Sakamoto Y, Ishiguro M, Kitagawa G (1983) Akaike information criterion statistics. Reidel, Boston · Zbl 0608.62006
[53] Schöner B (2000) Probabilistic characterization and synthesis of complex data driven systems. Ph. D. thesis, MIT
[54] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464 · Zbl 0379.62005
[55] Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20(4):471–484
[56] R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
[57] Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101
[58] Tipping TE, Bishop CM (1999) Mixtures of probabilistic principal component analysers. Neural Comput 11(2):443–482
[59] Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
[60] Wang Q, Carvalho C, Lucas J, West M (2007) BFRM: Bayesian factor regression modelling. Bull Int Soc Bayesian Anal 14(2):4–5
[61] West M (2003) Bayesian factor regression models in the ”large $p$, small $n$” paradigm. In: Bernardo J, Bayarri M, Berger J, Dawid A, Heckerman D, Smith A, West M (eds) Bayesian statistics, vol 7. Oxford University Press, Oxford, pp 723–732
[62] Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley
[63] Wolfe JH (1970) Pattern clustering by multivariate mixture analysis. Multivariate Behav Res 5(3):329–350
[64] Woodbury MA (1950) Inverting modified matrices. Statistical Research Group, Memo. Rep. no. 42. Princeton University, Princeton
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.