×

Maximal interaction two-mode clustering. (English) Zbl 1364.62159

Summary: Most classical approaches for two-mode clustering of a data matrix are designed to attain homogeneous row by column clusters (blocks, biclusters), that is, biclusters with a small variation of data values within the blocks. In contrast, this article deals with methods that look for a biclustering with a large interaction between row and column clusters. Thereby an aggregated, condensed representation of the existing interaction structure is obtained, together with corresponding row and column clusters, which both allow a parsimonious visualization and interpretation. In this paper we provide a statistical justification, in terms of a probabilistic model, for a two-mode interaction clustering criterion that has been proposed by H. H. Bock [in: Analyse de Données et Informatique. Cours de la Commission des Communautés Européennes à Fontainebleau, 19–30 Mars 1979. Le Chesnay, France: Institut National de Recherche en Informatique et en Automatique (INRIA). 187–203 (1980; Zbl 0454.62055)]. Furthermore, we show that maximization of this criterion is equivalent to minimizing the classical least-squares two-mode partitioning criterion for the double-centered version of the data matrix. The latter implies that the interaction clustering criterion can be optimized by applying classical two-mode partitioning algorithms. We illustrate the usefulness of our approach for the case of an empirical data set from personality psychology and we compare this method with other biclustering approaches where interactions play a role.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P15 Applications of statistics to psychology

Citations:

Zbl 0454.62055
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] BAIER, D., GAUL, W., and SCHADER, M. (1997), “Two-Mode Overlapping Clustering with Applications to Simultaneous Benefit Segmentation and Market Structuring”, in Classification and Knowledge Organization, eds. R. Klar and O. Opitz, Berlin: Springer, pp. 557-566. · Zbl 1452.62463
[2] BANFIELD, J., and RAFTERY, A. (1993), “Model-Based Gaussian and Non-Gaussian Clustering, <Emphasis Type=”Italic”>Biometrics, 49, 803-821. · Zbl 0794.62034 · doi:10.2307/2532201
[3] BOCK, H-H. (1968), “Statistische Modelle für die Einfache und Doppelte Klassifikation von Normalverteilten Beobachtungen [Statistical Models for the One-Way and Two-Way Classification of Normally Distributed Observations], Ph. D. thesis, Albert-Ludwigs-Universität zu Freiburg, Germany.
[4] BOCK, H-H. (1980), “Simultaneous Clustering of Objects and Variables”, in Analyse de Données et Informatique. Cours de la Commission des Communautés Européennes à Fontainebleau, 19-30 Mars 1979, eds. R. Tomassone, M. Amirchhay, and D. Néel, Le Chesnay, France: Institut National de Recherche en Informatique et en Automatique (INRIA), pp. 187-203. · Zbl 0442.00020
[5] BOCK, H-H. (1996), “Probabilistic Models in Cluster Analysis”, Computational Statistics and Data Analysis, 23, 5-28. · Zbl 0900.62324
[6] CARROLL, J., and ARABIE, P. (1980), “Multidimensional Scaling”, Annual Review of Psychology, 31, 607-649. · doi:10.1146/annurev.ps.31.020180.003135
[7] CASPI, A., and MOFFITT, T. (2006), “Gene-Environment Interactions in Psychiatry: Joining Forces with Neuroscience”, Nature Reviews Neuroscience, 7, 583-590. · doi:10.1038/nrn1925
[8] CASTILLO, W., and TREJOS, J. (2002), “Two-Mode Partitioning: Review of Methods and Application of Tabu Search”, in Classification, Clustering, and Related Topics. Recent Advances and Applications. Studies in Classification, Data Analysis, and Knowledge Organization, eds. K. Jajuga, A. Sokolowski, and H-H. Bock, Heidelberg, Germany: Springer-Verlag, pp. 43-51. · Zbl 1040.62053
[9] CEULEMANS, E., and KIERS, H. (2006), “Selecting Among Three-Mode Principal Component Models of Different Types and Complexities: A Numerical Convex Hull Based Method”, British Journal of Mathematical and Statistical Psychology, 59, 133-150. · doi:10.1348/000711005X64817
[10] CHENG, Y., and CHURCH, G. (2000), “Biclustering of Expression Data”, in Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 93-103.
[11] CHO, H., DHILLON, I., GUAN, A., and SRA, S. (2004), “Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data”, in Proceedings of the 4th SIAM International Conference on Knowledge Discovery and Data Mining, pp. 124-125.
[12] CORSTEN, L., and DENIS, J. (1990), “Structuring Interacion in Two-Way Tables by Clustering”, Biometrics, 46, 207-215. · Zbl 0715.62115 · doi:10.2307/2531644
[13] FORKMAN, J., and PIEPHO, H.-P. (2014), “Parametric Bootstrap Methods for Testing Multiplicative Terms in GGE and AMMI Models”, Biometrics, 70, 639-647. · Zbl 1299.65014 · doi:10.1111/biom.12162
[14] GABRIEL, K. (1971), “The Biplot Graphic Display of Matrices with Application to Principal Component Analysis”, Biometrika, 58, 453-467. · Zbl 0228.62034 · doi:10.1093/biomet/58.3.453
[15] GAUCH, H. (2006), “Statistical Analysis of Yield Trials by AMMI and GGE”, Crop Science, 46, 1488-1500. · doi:10.2135/cropsci2005.07-0193
[16] GAUCH, H., PIEPHO, H.-P., and ANNICCHIARICO, P. (2008), “Statistical Analysis of Yield Trials by AMMI and GGE: Further Considerations”, Crop Science, 48, 866-889. · doi:10.2135/cropsci2007.09.0513
[17] GAUL, W., and SCHADER, M. (1996), “A New Algorithm for Two-Mode Clustering”, in Data Analysis and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization, eds. H-H. Bock and W. Polasek, Berlin, Germany: Springer, pp. 15-23. · Zbl 0896.62060
[18] GEISER, C., LITSON, K., BISHOP, J., KELLER, B., BURNS, G., SERVERA, M., and SHIFFMAN, S. (2015), “Analyzing Person, Situation and Person X Situation Interaction Effects: Latent State-Trait Models for the Combination of Random and Fixed Situations”, Psychological Methods, 20, 165-192. · doi:10.1037/met0000026
[19] GOLLOB, H. (1968), “A Statistical Model Which Combines Features of Factor Analytic and Analysis of Variance Techniques”, Psychometrika, 33, 73-115. · Zbl 0167.48601 · doi:10.1007/BF02289676
[20] GOVAERT, G., and NADIF, M. (2013), Co-Clustering, Chichester, UK: Wiley. · Zbl 1181.68234 · doi:10.1002/9781118649480
[21] GOWER, J., and HAND, D. (1996), Biplots, London, UK: Chapman & Hall. · Zbl 0867.62053
[22] HANSOHM, J. (2001), “Two-Mode Clustering with Genetic Algorithms”, in Classification, Automation, and New Media. Studies in Classification, Data Analysis, and Knowledge Organization, eds. W. Gaul and G. Ritter, Berlin, Germany: Springer, pp. 87-93. · Zbl 1157.62447
[23] HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2, 193-218. · Zbl 0587.62128 · doi:10.1007/BF01908075
[24] HUNTER, D. (2005), “Gene-Environment Interactions in Human Diseases”, Nature Reviews Genetics, 6, 287-298. · doi:10.1038/nrg1578
[25] IOVLEFF, S., and SINGH BHATIA, P. (2015), “blockcluster: Coclustering Package for Binary, Categorical, Contingency and Continuous Data-Sets”, R package version 4.0.2, https://CRAN.R-project.org/package=blockcluster.
[26] KIERS, H. (2004), “Clustering All Three Modes of Three-Mode Data: Computational Posibilities and Problems”, in Proceedings in Computational Statistics, ed. J.Antoch, Heidelberg, Germany: Springer, pp. 303-313. · Zbl 1170.62340
[27] MADEIRA, S., and OLIVEIRA, A. (2004), “Biclustering Algorithms for Biological Data Analysis: A Survey”, IEEE Transactions on Computational Biology and Bioinformatics, 1, 24-45. · doi:10.1109/TCBB.2004.2
[28] MCLACHLAN, G. (1982), “The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis”, in Handbook of Statistics (Vol.2), eds. P.R. Krishnaiah and L.N. Kanal, Amsterdam: North-Holland, pp. 199-208. · Zbl 0513.62064
[29] MISCHEL, W., and SHODA, Y. (1995), “A Cognitive-Affective System Theory of Personality: Reconceptualizing Situations, Dispositions, Dynamics, and Invariance in Personality Structure”, Psychological Review, 102, 246-268. · doi:10.1037/0033-295X.102.2.246
[30] MISCHEL,W., and SHODA, Y. (1998), “Reconciling Processing Dynamics and Personality Dispositions”, Annual Review of Psychology, 49, 229-258. · doi:10.1146/annurev.psych.49.1.229
[31] MOFFITT, T., CASPI, A., and RUTTER, M. (2006), “Measured Gene-Environment Interactions in Psychopathology: Concepts, Research Strategies, and Implications for Research, Intervention, and Public Understanding of Genetics”, Perspectives on Psychological Science, 1, 5-27. · doi:10.1111/j.1745-6916.2006.00002.x
[32] NATIONAL INSTITUTE OF ENVIRONMENTAL HEALTH SCIENCES (2016), “Gene-Environment Interaction”, retrieved November 1, 2016 from http://www.niehs.nih.gov/health/topics/science/gene-env/. · Zbl 1260.62048
[33] PIEPHO, H.-P. (1997), “Analyzing Genotype-Environment Data by Mixed Models with Multiplicative Terms”, Biometrics, 53, 761-766. · Zbl 0885.62123 · doi:10.2307/2533976
[34] PIEPHO, H.-P. (1999), “Fitting a Regression Model for Genotype by Environment Data on Heading Dates in Grasses by Methods for Nonlinear Mixed Models”, Biometrics, 55, 1120-1128. · Zbl 1059.62687 · doi:10.1111/j.0006-341X.1999.01120.x
[35] QUINTIENS, G. (1999), “Een Interactionistische Benadering van Individuele Verschillen in Helpen en Laten Helpen [An Interactionist Approach to Individual Differences in Helping and Allowing to Help]”, unpublished master’s thesis, KULeuven, Belgium.
[36] ROCCI, R., and VICHI,M. (2008), “Two-Mode Multi-Partitioning”, Computational Statistics and Data Analysis, 52, 1984-2003. · Zbl 1452.62463 · doi:10.1016/j.csda.2007.06.025
[37] SCHEPERS, J., CEULEMANS, E., and VAN MECHELEN, I. (2008), “Selecting Among Multi-Mode Partitioning Models of Different Complexities: A Comparison of Four Model Selection Criteria”, Journal of Classification, 25, 67-85. · Zbl 1260.62048 · doi:10.1007/s00357-008-9005-9
[38] SCHEPERS, J., and HOFMANS, J. (2009), “TwoMP: A MATLAB Graphical User Interface for Two-Mode Partitioning”, Behavior Research Methods, 41, 507-514. · doi:10.3758/BRM.41.2.507
[39] SCHEPERS, J., VAN MECHELEN, I., and CEULEMANS, E. (2006), “Three-Mode Partitioning”, Computational Statistics and Data Analysis, 51, 1623-1642. · Zbl 1157.62447 · doi:10.1016/j.csda.2006.06.002
[40] SHAFII, B., and PRICE, W. (1998), “Analysis of Genotype-by-Environment Interaction Using the Additive Main Effects and Multiplicative Interaction Model and Stability Estimates, <Emphasis Type=”Italic”>Journal of Agricultural, Biological, and Environmental Statistics, 3, 335-345. · doi:10.2307/1400587
[41] SHODA, Y., WILSON, N., CHEN, J., GILMORE, A., and SMITH, R. (2013), “Cognitive-Affective Processing System Analysis of Intra-Individual Dynamics in Collaborative Therapeutic Assessment: Translating Basic Theory and Research into Clinical Applications”, Journal of Personality, 81, 554-1568. · doi:10.1111/jopy.12015
[42] SHODA, Y., WILSON, N., WHITSETT, D., LEE-DUSSUD, J., and ZAYAS, V. (2015), “The Person as a Cognitive Affective Processing System: Quantitative Idiography as an Integral Component of Cumulative Science”, in APA Handbook of Personality and Social Psychology: Vol.4. Personality Processes and Individal Differences, eds. M. Mikulincer and P. Shaver, American Psychological Association APA, Washington, pp. 491-513.
[43] STEINLEY, D. (2004), “Properties of the Hubert-Arabie Adjusted Rand Index”, Psychological Methods, 9, 386-396. · doi:10.1037/1082-989X.9.3.386
[44] TANAY, A., SHARAN, R., and SHAMIR, R. (2005), “Biclustering Algorithms: A Survey”, in Handbook of Computational Molecular Biology, ed. S. Aluru, Boca Raton: Chapman and Hall/CRC.
[45] VAN MECHELEN, I., BOCK, H-H., and DE BOECK, P. (2004), “Two-Mode Clustering Methods: A Structured Overview”, Statistical Methods in Medical Research, 13, 363-394. · Zbl 1053.62078
[46] VAN ROSMALEN, J., GROENEN, P., TREJOS, J., and CASTILLO, W. (2009), “Optimization Strategies for Two-Mode Partitioning, <Emphasis Type=”Italic”>Journal of Classification, 26, 155-181. · Zbl 1337.62145 · doi:10.1007/s00357-009-9031-2
[47] VICHI, M. (2001), “Double K-Means Clustering for Simultaneous Classification of Objects and Variables”, in Advances in Classification and Data Analysis, eds. S. Borra, R. Rocci, M. Vichi, and M. Schader, Berlin Heidelberg: Springer, pages 43-52. · Zbl 0978.00035
[48] WILDERJANS, T., CEULEMANS, E., and MEERS, K. (2013), “CHull: A Generic Convex Hull Based Model Selection Method”, Behavior Research Methods, 45, 1-15.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.