×

Model selection for Gaussian latent block clustering with the integrated classification likelihood. (English) Zbl 1416.62349

Summary: Block clustering aims to reveal homogeneous block structures in a data table. Among the different approaches of block clustering, we consider here a model-based method: the Gaussian latent block model for continuous data which is an extension of the Gaussian mixture model for one-way clustering. For a given data table, several candidate models are usually examined, which differ for example in the number of clusters. Model selection then becomes a critical issue. To this end, we develop a criterion based on an approximation of the integrated classification likelihood for the Gaussian latent block model, and propose a Bayesian information criterion-like variant following the same pattern. We also propose a non-asymptotic exact criterion, thus circumventing the controversial definition of the asymptotic regime arising from the dual nature of the rows and columns in co-clustering. The experimental results show steady performances of these criteria for medium to large data tables.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference

Software:

BayesDA
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Banerjee, A.; Dhillon, I.; Ghosh, J.; Merugu, S., A generalized maximum entropy approach to Bregman co-clustering and matrix approximation, J Mach Learn Res, 8, 1919-1986, (2007) · Zbl 1222.68139
[2] Berkhin P (2006) A survey of clustering data mining techniques. Springer, Berlin
[3] Biernacki C, Celeux G, Govaert G (1998) Assessing a mixture model for clustering with the integrated classification likelihood. Tech. rep, INRIA · Zbl 0933.68117
[4] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans Pattern Anal Mach Intell, 22, 719-725, (2002)
[5] Biernacki, C.; Celeux, G.; Govaert, G., Exact and Monte Carlo calculations of integrated likelihoods for the latent class model, J Stat Plan Infer, 140, 2991-3002, (2010) · Zbl 1203.62027
[6] Charrad M, Lechevallier Y, Saporta G, Ben Ahmed M (2010) Détermination du nombre de classes dans les méthodes de bipartitionnement. In: 17ème Rencontres de la Société Francophone de Classification, Saint-Denis de la Réunion, pp 119-122
[7] Daudin, JJ; Picard, F.; Robin, S., A mixture model for random graphs, Stat Comput, 18, 173-183, (2008)
[8] Fraley, C.; Raftery, AE, How many clusters? which clustering method? answers via model-based cluster analysis, Comput J, 41, 578-588, (1998) · Zbl 0920.68038
[9] Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis. CRC, Boca Raton · Zbl 1039.62018
[10] Good IJ (1965) Categorization of classification. Mathematics and Computer Science in Biology and Medicine, Her Majesty’s Stationery Office
[11] Govaert G (1977) Algorithme de classification d’un tableau de contingence. In: First international symposium on data analysis and informatics, INRIA, Versailles
[12] Govaert, G., Simultaneous clustering of rows and columns, Control Cybern, 24, 437-458, (1995) · Zbl 0852.62055
[13] Govaert, G.; Nadif, M., Clustering with block mixture models, Pattern Recogn, 36, 463-473, (2003)
[14] Hartigan, JA, Direct clustering of a data matrix, J Am Stat Assoc, 67, 123-129, (1972)
[15] Hartigan, JA, Bloc voting in the united states senate, J Classif, 17, 29-49, (2000) · Zbl 1103.91335
[16] Jagalur, M.; Pal, C.; Learned-Miller, E.; Zoeller, RT; Kulp, D., Analyzing in situ gene expression in the mouse brain with image registration, feature extraction and block clustering, BMC Bioinforma, 8, s5, (2007)
[17] Kemp C, Griffiths TL, Tenenbaum JB (2004) Discovering latent classes in relational data. Tech. rep, Computer science and artificial intelligence laboratory
[18] Keribin C, Brault V, Celeux G, Govaert G (2012) Model selection for the binary latent block model. In: Colubi A, Fokianos K, Gonzalez-Rodriguez G, Kontoghiorghes EJ (eds) Proceedings of Compstat 2012, 20th international conference on computational statistics, The International Statistical Institute/International Association for Statistical, Computing, pp 379-390
[19] Keribin C, Brault V, Celeux G, Govaert G et al (2013) Estimation and selection for the latent block model on categorical data. Tech. rep, INRIA · Zbl 1331.62149
[20] Kluger, Y.; Basri, R.; Chang, JT; Gerstein, M., Spectral biclustering of microarray data: coclustering genes and conditions, Genome Res, 13, 703-716, (2003)
[21] Lomet A, Govaert G, Grandvalet Y (2012a) Design of artificial data tables for co-clustering analysis. Université de Technologie de Compiègne, Tech. rep
[22] Lomet A, Govaert G, Grandvalet Y (2012b) Model selection in block clustering by the integrated classification likelihood. In: Colubi A, Fokianos K, Gonzalez-Rodriguez G, Kontoghiorghes EJ (eds) Proceedings of Compstat 2012, 20th international conference on computational statistics, The International Statistical Institute/International Association for Statistical, Computing, pp 519-530
[23] Mariadassou M, Matias C (2012) Convergence of the groups posterior distribution in latent or stochastic block models. Tech. rep., arXiv · Zbl 1329.62285
[24] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
[25] Nadif M, Govaert G (2008) Algorithms for model-based block Gaussian clustering. In: DMIN’08, the 2008 international conference on data mining, Las Vegas, Nevada, USA · Zbl 1452.62444
[26] Richardson, S.; Green, PJ, On Bayesian analysis of mixtures with an unknown number of components (with discussion), J R Stat Soc Ser B Stat Methodol, 59, 731-792, (1997) · Zbl 0891.62020
[27] Robert C (2001) The Bayesian choice. Springer, Berlin
[28] Rocci, R.; Vichi, M., Two-mode multi-partitioning, Comput Stat Data Anal, 52, 1984-2003, (2008) · Zbl 1452.62463
[29] Schepers, J.; Ceulemans, E.; Mechelen, I., Selecting among multi-mode partitioning models of different complexities: a comparison of four model selection criteria, J Classif, 25, 67-85, (2008) · Zbl 1260.62048
[30] Seldin Y, Tishby N (2010) Pac-Bayesian analysis of co-clustering and beyond. J Mach Learn Res 11: 3595-3646 · Zbl 1242.62060
[31] Shan H, Banerjee A (2008) Bayesian co-clustering. In: 8th IEEE international conference on data mining, 2008. ICDM’08, pp 530-539
[32] Van Dijk B, Van Rosmalen J, Paap R (2009) A Bayesian approach to two-mode clustering. Tech. Rep. 2009-06, Econometric Institute. http://hdl.handle.net/1765/15112
[33] Wyse, J.; Friel, N., Block clustering with collapsed latent block models, Stat Comput, 22, 415-428, (2012) · Zbl 1322.62046
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.