Mutual information, phi-squared and model-based co-clustering for contingency tables. (English) Zbl 1416.62309

Summary: Many of the datasets encountered in statistics are two-dimensional in nature and can be represented by a matrix. Classical clustering procedures seek to construct separately an optimal partition of rows or, sometimes, of columns. In contrast, co-clustering methods cluster the rows and the columns simultaneously and organize the data into homogeneous blocks (after suitable permutations). Methods of this kind have practical importance in a wide variety of applications such as document clustering, where data are typically organized in two-way contingency tables. Our goal is to offer coherent frameworks for understanding some existing criteria and algorithms for co-clustering contingency tables, and to propose new ones. We look at two different frameworks for the problem of co-clustering. The first involves minimizing an objective function based on measures of association and in particular on phi-squared and mutual information. The second uses a model-based co-clustering approach, and we consider two models: the block model and the latent block model. We establish connections between different approaches, criteria and algorithms, and we highlight a number of implicit assumptions in some commonly used algorithms. Our contribution is illustrated by numerical experiments on simulated and real-case datasets that show the relevance of the presented methods in the document clustering field.


62H17 Contingency tables
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI


[1] Ailem, M.; Role, F.; Nadif, M., Graph modularity maximization as an effective method for co-clustering text data, Knowl Based Syst, 109, 160-173, (2016)
[2] Arabie, P.; Hubert, LJ, The bond energy algorithm revisited, IEEE Trans Syst Man Cybern, 20, 268-274, (1990)
[3] Arabie P, Schleutermann S, Daws J, Hubert L (1988) Marketing applications of sequencing and partitioning of nonsymmetric and/or two-mode matrices. In: Data, expert knowledge and decisions. Springer, pp 215-224
[4] Baier D, Gaul W, Schader M (1997) Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In: Classification and knowledge organization. Springer, pp 557-566
[5] Benzecri JP (1973) L’analyse des données, tome 2: l’analyse des correspondances. Dunod, Paris · Zbl 0297.62039
[6] Bock, HH; Tomassone, R. (ed.), Simultaneous clustering of objects and variables, 187-203, (1979), Le Chesnay
[7] Bock HH (1992) A clustering technique for maximizing \(φ \)-divergence, noncentrality and discriminating power. In: Analyzing and modeling data and knowledge. Springer, pp 19-36
[8] Bock, H. H., Information and entropy in cluster analysis, 115-147, (1994), Dordrecht
[9] Bock, HH, Convexity-based clustering criteria: theory, algorithms, and applications in statistics, Stat Methods Appl, 12, 293-317, (2004) · Zbl 1058.62051
[10] Bryant, PG, On characterizing optimization-based clustering criteria, J Classif, 5, 81-84, (1988)
[11] Castillo, W.; Trejos, J.; Bock, HH (ed.), Two-mode partitioning: review of methods and application of tabu search, 43-51, (2002), Heidelberg · Zbl 1040.62053
[12] Celeux, G.; Govaert, G., A classification EM algorithm for clustering and two stochastic versions, Comput Stat Data Anal, 14, 315-332, (1992) · Zbl 0937.62605
[13] Cheng Y, Church GM (2000) Biclustering of expression data. In: ISMB2000, 8th international conference on intelligent systems for molecular biology, vol 8, pp 93-103
[14] Cho, H.; Dhillon, I., Coclustering of human cancer microarrays using minimum sum-squared residue coclustering, IEEE/ACM Trans Comput Biol Bioinform (TCBB), 5, 385-400, (2008)
[15] Cramer H (1946) Mathematical methods of statistics. Princeton University Press, Princeton · Zbl 0063.01014
[16] Deerwester, S.; Dumais, S.; Furnas, G.; Landauer, T.; Harshman, R., Indexing by latent semantic analysis, J Am Soc Inf Sci, 41, 391-407, (1990)
[17] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B, 39, 1-38, (1977) · Zbl 0364.62022
[18] Dhillon, IS, Co-clustering documents and words using bipartite spectral graph partitioning, 269-274, (2001), New York
[19] Dhillon, IS; Modha, DS, Concept decompositions for large sparse text data using clustering, Mach Learn, 42, 143-175, (2001) · Zbl 0970.68167
[20] Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2003), pp 89-98
[21] Ding C, He X, Simon H (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: SIAM data mining conference
[22] Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, p 135
[23] Duffy, DE; Quiroz, AJ, A permutation-based algorithm for block clustering, J Classif, 8, 65-91, (1991)
[24] Govaert, G., Algorithme de classification d’un tableau de contingence, 487-500, (1977), Versailles
[25] Govaert G (1983) Classification croisée. Thèse d’état, Université Paris 6, France
[26] Govaert, G., Simultaneous clustering of rows and columns, Control Cybern, 24, 437-458, (1995) · Zbl 0852.62055
[27] Govaert, G.; Nadif, M., Clustering with block mixture models, Pattern Recognit, 36, 463-473, (2003)
[28] Govaert, G.; Nadif, M., An EM algorithm for the block mixture model, IEEE Trans Pattern Anal Mach Intell, 27, 643-647, (2005)
[29] Govaert, G.; Nadif, M., Clustering of contingency table and mixture model, Eur J Oper Res, 183, 1055-1066, (2007) · Zbl 1138.62035
[30] Govaert, G.; Nadif, M., Block clustering with Bernoulli mixture models: comparison of different approaches, Comput Stat Data Anal, 52, 3233-3245, (2008) · Zbl 1452.62444
[31] Govaert, G.; Nadif, M., Latent block model for contingency table, Commun Stat Theory Methods, 39, 416-425, (2010) · Zbl 1187.62117
[32] Govaert G, Nadif M (2013) Co-clustering. Wiley, New York
[33] Greenacre, M., Clustering the rows and columns of a contingency table, J Classif, 5, 39-51, (1988) · Zbl 0652.62053
[34] Gupta, N.; Aggarwal, S., Mib: using mutual information for biclustering gene expression data, Pattern Recognit, 43, 2692-2697, (2010) · Zbl 1207.68281
[35] Hanczar, B.; Nadif, M., Using the bagging approach for biclustering of gene expression data, Neurocomputing, 74, 1595-1605, (2011)
[36] Hanczar, B.; Nadif, M., Ensemble methods for biclustering tasks, Pattern Recognit, 45, 3938-3949, (2012)
[37] Hanczar B, Nadif M (2013) Precision-recall space to correct external indices for biclustering. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 136-144
[38] Harris RR, Kanji GK (1983) On the use of minimum chi-square estimation. The Statistician, pp 379-394
[39] Hartigan, JA, Direct clustering of a data matrix, JASA, 67, 123-129, (1972)
[40] Hathaway, RJ, Another interpretation of the EM algorithm for mixture distributions, Stat Probab Lett, 4, 53-56, (1986) · Zbl 0585.62052
[41] Hofmann, T., Probabilistic latent semantic indexing, 50-57, (1999), New York
[42] Labiod L, Nadif M (2011a) Co-clustering for binary and categorical data with maximum modularity. In: 2011 IEEE 11th international conference on data mining, pp 1140-1145
[43] Labiod L, Nadif M (2011b) Co-clustering under nonnegative matrix tri-factorization. In: Neural information processing—18th international conference. ICONIP, pp 709-717
[44] Labiod, L.; Nadif, M., A unified framework for data visualization and coclustering, IEEE Trans Neural Netw Learn Syst, 26, 2194-2199, (2015)
[45] Li, L.; Guo, Y.; Wu, W.; Shi, Y.; Cheng, J.; Tao, S., A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data, BioData Min, 5, 1, (2012)
[46] Long, B.; Zhang, Z.; Yu, P., Co-clustering by block value decomposition, 635-640, (2005), New York
[47] Madeira, SC; Oliveira, AL, Biclustering algorithms for biological data analysis: a survey, IEEE/ACM Trans Comput Biol Bioinform (TCBB), 1, 24-45, (2004)
[48] Marcotorchino, F., Block seriation problems: a unified approach, Appl Stoch Models Data Anal, 3, 73-91, (1987) · Zbl 0624.90048
[49] Neal RM, Hinton GE (1998) A view of the em algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models. Springer, pp 355-368 · Zbl 0916.62019
[50] Neyman, J., Contribution to the theory of chi-square test, 239-273, (1949), Berkeley
[51] Pearson, K., On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling, Lond Edinb Dublin Philos Mag J Sci, 50, 157-175, (1900) · JFM 31.0238.04
[52] Pötzelberger K, Strasser H (1997) Data compression by unsupervised classification
[53] Pötzelberger, K.; Strasser, H., Clustering and quantization by MSP-partitions, Stat Decis Int J Stoch Methods Models, 19, 331-372, (2001) · Zbl 1180.62093
[54] Rocci, R.; Vichi, M., Two-mode multi-partitioning, Comput Stat Data Anal, 52, 1984-2003, (2008) · Zbl 1452.62463
[55] Santamaría R, Quintales L, Therón R (2007) Methods to bicluster validation and comparison in microarray data. In: Intelligent data engineering and automated learning-IDEAL 2007. Springer, pp 780-789
[56] Strehl, A.; Ghosh, J., Cluster ensembles—a knowledge reuse framework for combining multiple partitions, J Mach Learn Res, 3, 583-617, (2003) · Zbl 1084.68759
[57] Tanay, A.; Sharan, R.; Shamir, R., Biclustering algorithms: a survey, Handb Comput Mol Biol, 9, 122-124, (2005)
[58] Trejos, J.; Castillo, W.; Decker, R. (ed.); Gaul, W. (ed.), Simulated annealing optimization for two-mode partitioning, 135-142, (2000), Heidelberg
[59] Van Mechelen I, Schepers J (2006) A unifying model for biclustering. In: Compstat 2006-proceedings in computational statistics. Springer, pp 81-88
[60] Mechelen, I.; Bock, HH; Boeck, P., Two-mode clustering methods: a structured overview, Stat Methods Med Res, 13, 363-394, (2004) · Zbl 1053.62078
[61] Vichi, M., Double k-means clustering for simultaneous classification of objects and variables, 43-52, (2001), Heidelberg
[62] Windham, MP, Parameter modification for clustering criteria, J Classif, 4, 191-214, (1987) · Zbl 0662.62066
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.