×

A finite mixture approach to joint clustering of individuals and multivariate discrete outcomes. (English) Zbl 1465.62121

Summary: In this work, we modify finite mixtures of factor analysers to provide a method for simultaneous clustering of subjects and multivariate discrete outcomes. The joint clustering is performed through a suitable reparameterization of the outcome (column)-specific parameters. We develop an expectation-maximization-type algorithm for maximum likelihood parameter estimation where the maximization step is divided into orthogonal sub-blocks that refer to row and column-specific parameters, respectively. Model performance is evaluated via a simulation study with varying sample size, number of outcomes and row/column-specific clustering (partitions). We compare the performance of our model with the performance of standard model-based biclustering approaches. The proposed method is also demonstrated on a benchmark data set where a multivariate binary response is considered.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H12 Estimation in multivariate analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ghahramani Z, Hinton GE. The EM algorithm for mixture of factor analyzers. Technical Report, CRG-TR-96-1, 8, University of Toronto; 1997. [Google Scholar]
[2] McNicholas PD, Murphy TB. Parsimonious Gaussian mixture models. Stat Comput. 2008;18:285-296. doi: 10.1007/s11222-008-9056-0[Crossref], [Web of Science ®], [Google Scholar]
[3] Greselin F, Ingrassia S. Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput. 2015;25(2):215-226. doi: 10.1007/s11222-013-9427-z[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1331.62307
[4] Murray PM, Browne RP, McNicholas PD. Mixtures of skew-t factor analyzers. Comput Statist Data Anal. 2014;77:326-335. doi: 10.1016/j.csda.2014.03.012[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1506.62132
[5] Tortora C, McNicholas PD, Browne RP. A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif. 2016;10: 423-440. doi: 10.1007/s11634-015-0204-z[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1414.62278
[6] Martella F, Alfò M, Vichi M. Biclustering of gene expression data by an extension of mixtures of factor analyzers. The Int J Biostat. 2008;4(1):3. doi: 10.2202/1557-4679.1078[Crossref], [Google Scholar]
[7] Martella F, Alfò M, Vichi M. Hierarchical mixture models for biclustering in microarray data. Stat Model. 2011;11(6):489-505. doi: 10.1177/1471082X1001100602[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1420.62273
[8] Vicari D, Alfò M. Model based clustering of customer choice data. Comput Statist Data Anal. 2014;71:3-13. doi: 10.1016/j.csda.2013.09.014[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1471.62199
[9] Hartigan JA. Direct clustering of a data matrix. J Amer Statist Assoc. 1972;67:123-129. doi: 10.1080/01621459.1972.10481214[Taylor & Francis Online], [Web of Science ®], [Google Scholar]
[10] Hartigan JA. Clustering algorithms. New York: John Wiley & Sons, Inc.; 1975. [Crossref], [Google Scholar] · Zbl 0372.62040
[11] Bock HH. Automatische Klassifikation. Gottingen: Vandenhoeck and Ruprecht; 1974. [Google Scholar] · Zbl 0279.62013
[12] Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM transactions on computational biology and bioinformatics/IEEE. ACM; 2004. p. 24-45. [Google Scholar]
[13] VanMechelen I, Schepers J. A unifying model for biclustering. COMPSTAT2006 Proceedings, Università degli Studi di Roma La Sapienza, Rome, Italy; 2006. [Google Scholar] · Zbl 1437.62048
[14] Govaert G, Nadif M. Co-clustering: models, algorithms and applications. New York: Wiley; 2013. [Crossref], [Google Scholar] · Zbl 1416.62309
[15] Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000;8:93-103. [PubMed], [Google Scholar]
[16] Ihmels J, Friedlander G, Bergman S, et al. Revealing modular organization in the yeast transcriptional network. Nat Genet. 2002;31:370-377. [PubMed], [Web of Science ®], [Google Scholar]
[17] Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002;18(Suppl. 1):S136-S144. doi: 10.1093/bioinformatics/18.suppl_1.S136[Crossref], [PubMed], [Google Scholar]
[18] Ben-Dor A, Chor B, Karp R, et al. Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol. 2003;10:373-384. doi: 10.1089/10665270360688075[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[19] Murali TM, Kasif S. Extracting conserved gene expression motifs from gene expression data. IEEE/ACM Trans Comput Biol Bioinform. 2003;8:77-88. [Google Scholar] · Zbl 1219.92024
[20] Lee M, Shen H, Huang JZ, et al. Biclustering via sparse singular value decomposition. Biometrics. 2010;66:1087-1095. doi: 10.1111/j.1541-0420.2010.01392.x[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1233.62182
[21] Kiraly A, Abonyi J, Laiho A, et al. Biclustering of high-throughput gene expression data with bicluster miner. International Conference Data Mining Workshops; 2012. p. 131-138. [Google Scholar]
[22] Li L, Guo Y, Wu W, et al. A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data. BioData Min. 2012;5:1-10. doi: 10.1186/1756-0381-5-8[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[23] Dhillon IS. Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the SeventhACMSIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’01. New York (NY): ACM; 2001. p. 269-274. [Google Scholar]
[24] Bisson G, Hussain F. Chi-sim: a new similarity measure for the co-clustering task. In: Machine learning and applications, ICMLA ’08, Seventh International Conference; 2008. p. 211-217. [Google Scholar]
[25] Lazzeroni L, Owen AB. Plaid models for gene expression data. Statist Sinica. 2002;12:61-86. [Web of Science ®], [Google Scholar] · Zbl 1004.62084
[26] Sheng Q, Moreau Y, De Moor B. Biclustering microarray data by Gibbs sampling. Bioinformatics. 2003;19:196-205. doi: 10.1093/bioinformatics/btg1078[Crossref], [Web of Science ®], [Google Scholar]
[27] Dhollander T, Sheng Q, Lemmens K, et al. Query-driven module discovery in microarray data. Bioinformatics. 2007;23: 2573-2580. doi: 10.1093/bioinformatics/btm387[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[28] Govaert G, Nadif M. Clustering with block mixture models. Pattern Recognit. 2003;36(2):463-473. doi: 10.1016/S0031-3203(02)00074-2[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1452.62444
[29] Govaert G, Nadif M. Block clustering with Bernoulli mixture models: comparison of different approaches. Comput Statist Data Anal. 2008;52:3233-3245. doi: 10.1016/j.csda.2007.09.007[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1452.62444
[30] Wyse J, Friel N. Block clustering with collapsed latent block models. Stat Comput. 2012;22:415-428. doi: 10.1007/s11222-011-9233-4[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1322.62046
[31] Keribin C, Brault V, Celeux G, et al. Estimation and selection for the latent block model on categorical data. Stat Comput 2014; 25:1201-1216. doi: 10.1007/s11222-014-9472-2[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1331.62149
[32] Priam R, Nadif M, Govaert G. The block generative topographic mapping. In: ANNPR’2008, LNAI. Berlin: Springer; 2008. p. 13-23. [Google Scholar] · Zbl 1328.62389
[33] Priam R, Nadif M, Govaert G. Topographic Bernoulli block mixture mapping for binary tables. Pattern Anal Appl. 2014;17:839-847. doi: 10.1007/s10044-014-0368-8[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1328.62389
[34] Li J, Zha H. Two-way Poisson mixture models for simultaneous document classification and word clustering. Comput Statist Data Anal. 2006;50(1):163-180. doi: 10.1016/j.csda.2004.07.013[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1429.62253
[35] Lee S, Huang JZ. A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood. Stat Comput. 2014;24(3):429-441. doi: 10.1007/s11222-013-9379-3[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1325.62013
[36] Melnykov V. Model-based biclustering of clickstream data. Comput Statist Data Anal. 2014. Available on line 28 September 2014. doi: 10.1016/j.csda.2014.09.016[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1468.62138
[37] Barkow S, Bleuler S, Prelic A, et al. BicAT: a biclustering analysis toolbox. Bioinformatics. 2006;22(10):1282-1283. doi: 10.1093/bioinformatics/btl099[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[38] Kaiser S, Leisch F. A toolbox for bicluster analysis in R. Technical Report 28, Department of Statistics: Technical Reports; 2008. [Google Scholar]
[39] Prelic A, Bleuler S, Zimmermann P, et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006;22:1122-1129. doi: 10.1093/bioinformatics/btl060[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[40] Rodriguez-Baena DS, Perez-Pulido A, Aguilar-Ruiz JS. A biclustering algorithm for extracting bit-patterns from binary datasets. Bioinformatics. 2011;27: 2738-2745. doi: 10.1093/bioinformatics/btr464[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[41] Van Uitert M, Meuleman W, Wessels L. Biclustering sparse binary genomic data. J Comput Biol. 2008;15:1329-1345. doi: 10.1089/cmb.2008.0066[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[42] Shamir R, Maron-Katz A, Tanay A, et al. EXPANDER-an integrative program suite for microarray data analysis. BMC Bioinform. 2005;6:232. doi: 10.1186/1471-2105-6-232[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[43] Goncalves JP, Madeira SC, Oliveira AL. BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data. BMC Res Notes. 2009;2(1),124. ISSN 1756-0500. doi: 10.1186/1756-0500-2-124. See http://www.biomedcentral.com/1756-0500/2/124[Crossref], [PubMed], [Google Scholar]
[44] Bhatia P, Iovleff S, Govaert G. blockcluster: an R Package for model based co-clustering. J Stat Softw. 2014; 76 (submitted). [Google Scholar]
[45] Lazarsfeld PF, Henry NW. Latent structure analysis. Boston: Houghton Mifflin; 1968. [Google Scholar] · Zbl 0182.52201
[46] Bartolucci F. A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika. 2007;72:141-157. doi: 10.1007/s11336-005-1376-9[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1286.62099
[47] Bartolucci F, Montanari GE, Pandolfi S. Dimensionality of the latent structure and item selection via latent class multidimensional IRT models. Psychometrika. 2012;77:782-802. doi: 10.1007/s11336-012-9278-0[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1284.62681
[48] Gollini I, Murphy TB. Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comput. 2014;24(4):569-588. doi: 10.1007/s11222-013-9389-1[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1325.62122
[49] Rost J. Rasch models in latent classes: an integration of two approaches to item analysis. Appl Psychol Meas. 1990;14:271-282. doi: 10.1177/014662169001400305[Crossref], [Web of Science ®], [Google Scholar]
[50] Rost J, von Davier M. Mixture distribution Rasch models. In: Fischer GH, Molenaar IW, editors. Rasch models: Foundations, recent developments, and applications. New York: Springer; 1995. p. 257-268. [Google Scholar] · Zbl 0825.62926
[51] von Davier M, Yamamoto K. Mixture distribution and HYBRID Rasch models. In: von Davier M, Carstensen CH, editors. Multivariate and mixture distribution Rasch models. New York: Springer; 2007. p. 99-115. [Google Scholar] · Zbl 1117.62133
[52] von Davier M, Rost J, Carstensen CH. Introduction: extending the Rasch model. In: von Davier M, Carstensen CH, editors. Multivariate and mixture distribution Rasch models. New York: Springer; 2007. p. 1-12. [Google Scholar] · Zbl 1117.62133
[53] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol. 1977;39: 1-38. [Google Scholar] · Zbl 0364.62022
[54] Biernacki C, Celeux G, Govaert G. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Statist Data Anal. 2003;41:561-575. doi: 10.1016/S0167-9473(02)00163-9[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1429.62235
[55] Lindstrom MJ, Bates DM. Netwon-Raphson and EM algorithms for linear mixed effects models for repeated measures data. J Amer Statist Assoc. 1998;83:1014-1022. [Google Scholar] · Zbl 0671.65119
[56] McNicholas PD, Murphy TB, McDaid AF, et al. Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Statist Data Anal. 2010;54(3):711-723. doi: 10.1016/j.csda.2009.02.011[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1464.62131
[57] Seidel W, Mosler K, Alker M. A cautionary note on likelihood ratio tests in mixture models. Annals of the Institute of Statistical Mathematics. 2000;52:481-487. doi: 10.1023/A:1004117419204[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0960.62025
[58] Bohning D, Dietz E, Schaub R, et al. The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Statist Math. 1994;46:373-388. doi: 10.1007/BF01720593[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0802.62017
[59] Pilla RS, Kamarthi SV, Lindsay BG. Aitken-based acceleration methods for assessing convergence of multilayer neural networks. IEEE Trans Neural Netw. 2001;12:998-1012. doi: 10.1109/72.950130[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[60] Lindsay BG. The geometry of mixture likelihood: a general theory. Ann Statist. 1983;11:86-94. doi: 10.1214/aos/1176346059[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0512.62005
[61] Lindsay BG. Mixture models: theory, geometry and applications. NSF-CBMS regional conference series in probability and statistics. Institute of Mathematical Statistics, California. Vol. 5; 1995. [Google Scholar] · Zbl 1163.62326
[62] Pilla RS, Lindsay BG. Alternative EM methods for nonparametric finite mixture models. Biometrika. 2001;88:535-550. doi: 10.1093/biomet/88.2.535[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0984.62024
[63] Wu CFJ. On the convergence properties of the EM algorithm. Ann Statist. 1983;11: 95-103. doi: 10.1214/aos/1176346060[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0517.62035
[64] Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. 2nd International symposium on information theory. Budapest: Akademiai Kiado; 1973. p. 267-281. [Google Scholar] · Zbl 0283.62006
[65] Schwarz G. Estimating the dimension of a model. Ann Statist. 1978;6:461-464. doi: 10.1214/aos/1176344136[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0379.62005
[66] McLachlan GJ, Peel D. Finite mixture models. New York: Wiley; 2000. [Crossref], [Google Scholar] · Zbl 0963.62061
[67] Biernacki C, Celeux G, Govaert G. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell. 2000;22(7):719-725. doi: 10.1109/34.865189[Crossref], [Web of Science ®], [Google Scholar]
[68] Dasgupta A, Raftery AE. Detecting features in spatial point processes with clutter via model-based clustering. J Amer Statist Assoc. 1998;93:294-302. doi: 10.1080/01621459.1998.10474110[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 0906.62105
[69] Teicher H. Identifiability of mixtures. Ann Math Statist. 1961;32:244-248. doi: 10.1214/aoms/1177705155[Crossref], [Google Scholar] · Zbl 0146.39302
[70] Teicher H. Identifiability of finite mixtures. Ann Math Statist. 1963;34:1265-1269. doi: 10.1214/aoms/1177703862[Crossref], [Google Scholar] · Zbl 0137.12704
[71] Yakowitz SJ, Spragins JD. On the identifiability of finite mixtures. Ann Math Statist. 1968;39:209-214. doi: 10.1214/aoms/1177698520[Crossref], [Google Scholar] · Zbl 0155.25703
[72] Atienza N, Garcia-Heras J, Muñoz-Pichardo JM. A new condition for identifiability of finite mixture distributions. Metrika. 2006;63:215-221. doi: 10.1007/s00184-005-0013-z[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1095.62016
[73] Follman DA, Lambert D. Generalizing logistic regression by nonparametric mixing. J Amer Statist Assoc. 1989;84:295-300. doi: 10.1080/01621459.1989.10478769[Taylor & Francis Online], [Web of Science ®], [Google Scholar]
[74] Wang P, Puterman ML, Cockburn I, et al. Mixed poisson regression models with covariate dependent rates. Biometrics. 1996;52:381-400. doi: 10.2307/2532881[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 0875.62407
[75] Hennig C. Identifiability of models for clusterwise linear regression. J Classif. 2000;17:273-296. doi: 10.1007/s003570000022[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1017.62058
[76] Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[77] Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007;23(21):2881-2887. doi: 10.1093/bioinformatics/btm453[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[78] Robinson MD, Smyth GK. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008;9(2):321-332. doi: 10.1093/biostatistics/kxm030[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1143.62312
[79] Hardcastle TJ, Kelly KA. BaySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010;11:422. doi: 10.1186/1471-2105-11-422[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[80] Zhou Y H, Xia K, Wright FA. A powerful and flexible approach to the analysis of RNA sequence count data. Bioinformatics. 2011;27(19):2672-2678. doi: 10.1093/bioinformatics/btr449[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[81] Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics. 2013;14(2):232-243. doi: 10.1093/biostatistics/kxs033[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[82] Risso D, Schwartz K, Sherlock G, et al. GC-content normalization for RNA-seq data. Technical report #291, University of California, Berkeley, Division of Biostatistics; 2011. Available from: http://www.bepress.com/ucbbiostat/paper291/[Google Scholar]
[83] Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2: 193-218. doi: 10.1007/BF01908075[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0587.62128
[84] Frank A, Asuncion A. UCI machine learning repository. Irvine (CA): University of California, School of Information and Computer Science; 2010. Available from: http://archive.ics.uci.edu/ml. [Google Scholar]
[85] Bartolucci F, Farcomeni A. A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. J Amer Statist Assoc. 2009;104:816-831. doi: 10.1198/jasa.2009.0107[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1388.62158
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.