×

Regularized bi-directional co-clustering. (English) Zbl 1475.62012

Summary: The simultaneous clustering of documents and words, known as co-clustering, has proved to be more effective than one-sided clustering in dealing with sparse high-dimensional datasets. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises-Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper, we propose a general co-clustering framework based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible framework for text co-clustering that can easily incorporate both word-word semantic relationships and document-document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a bi-directional multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ahalt, SC; Krishnamurthy, AK; Chen, P.; Melton, DE, Competitive learning algorithms for vector quantization, Neural Netw., 3, 3, 277-290 (1990) · doi:10.1016/0893-6080(90)90071-R
[2] Ailem, M.; Role, F.; Nadif, M., Graph modularity maximization as an effective method for co-clustering text data, Knowl. Based Syst., 109, 160-173 (2016) · doi:10.1016/j.knosys.2016.07.002
[3] Ailem, M.; Role, F.; Nadif, M., Model-based co-clustering for the effective handling of sparse data, Pattern Recognit., 72, 108-122 (2017) · doi:10.1016/j.patcog.2017.06.005
[4] Ailem, M., Salah, A., Nadif, M.: Non-negative matrix factorization meets word embedding. In Proceedings of the 40th international acm sigir conference on research and development in information retrieval, pp. 1081-1084 (2017b)
[5] Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Selected Papers of Hirotugu Akaike, pp. 199-213. Springer (1998)
[6] Banerjee, A.; Dhillon, IS; Ghosh, J.; Sra, S., Clustering on the unit hypersphere using von Mises-Fisher distributions, J. Mach. Learn. Res., 6, 1345-1382 (2005) · Zbl 1190.62116
[7] Banerjee, A.; Ghosh, J., Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres, IEEE Trans. Neural Netw., 15, 3, 702-719 (2004) · doi:10.1109/TNN.2004.824416
[8] Bock, H.: Simultaneous clustering of objects and variables. In: Tomassone, R. (ed.) Analyse des Données et Informatique, pp. 187-203. INRIA, Le Chesnay (1979) · Zbl 0454.62055
[9] Bock, H.-H.: Co-clustering for object by variable data matrices. In: Advanced Studies in Behaviormetrics and Data Science, pp. 3-17. Springer (2020)
[10] Bozdogan, H., Akaike’s information criterion and recent developments in information complexity, J. Math. Psychol., 44, 1, 62-91 (2000) · Zbl 1047.62501 · doi:10.1006/jmps.1999.1277
[11] Cho, H.; Dhillon, IS, Coclustering of human cancer microarrays using minimum sum-squared residue Coclustering, IEEE/ACM Trans. Comput. Biol. Bioinform., 5, 3, 385-400 (2008) · doi:10.1109/TCBB.2007.70268
[12] Deodhar, M.; Ghosh, J., Scoal: a framework for simultaneous co-clustering and learning from complex data, ACM Trans. Knowl. Discov. Data, 4, 3, 1-31 (2010) · doi:10.1145/1839490.1839492
[13] DeSieno, D.: Adding a conscience to competitive learning. In: IEEE international conference on neural networks, vol. 1, pp. 117-124, San Diego, CA, USA. Institute of Electrical and Electronics Engineers New York, IEEE (1988)
[14] Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: The 9th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 89-98 (2003)
[15] Dhillon, IS; Modha, DS, Concept decompositions for large sparse text data using clustering, Mach. Learn., 42, 1-2, 143-175 (2001) · Zbl 0970.68167 · doi:10.1023/A:1007612920971
[16] Gopal, S., Yang, Y.: Von mises-fisher clustering models. In: International Conference on Machine Learning, pp. 154-162. PMLR (2014)
[17] Govaert, G.: Classification croisée. Thèse d’état, Université Paris 6, France (1983)
[18] Govaert, G.; Nadif, M., Clustering with block mixture models, Pattern Recognit., 36, 2, 463-473 (2003) · doi:10.1016/S0031-3203(02)00074-2
[19] Govaert, G.; Nadif, M., An EM algorithm for the block mixture model, IEEE Trans. Pattern Anal. Mach. Intell., 27, 4, 643-647 (2005) · doi:10.1109/TPAMI.2005.69
[20] Govaert, G.; Nadif, M., Block clustering with Bernoulli mixture models: comparison of different approaches, Comput. Stat. Data Anal., 52, 6, 3233-3245 (2008) · Zbl 1452.62444 · doi:10.1016/j.csda.2007.09.007
[21] Govaert, G.; Nadif, M., Co-clustering: Models (2013), New York: Algorithms and Applications. Wiley, New York · Zbl 0910.62021 · doi:10.1002/9781118649480
[22] Govaert, G.; Nadif, M., Mutual information, phi-squared and model-based co-clustering for contingency tables, Adv. Data Anal. Classif., 12, 3, 455-488 (2018) · Zbl 1416.62309 · doi:10.1007/s11634-016-0274-6
[23] Hanczar, B.; Nadif, M., Ensemble methods for biclustering tasks, Pattern Recognit., 45, 11, 3938-3949 (2012) · doi:10.1016/j.patcog.2012.04.010
[24] Hartigan, JA, Direct clustering of a data matrix, J. Am. Stat. Assoc., 67, 337, 123-129 (1972) · doi:10.1080/01621459.1972.10481214
[25] Hofmann, T., Puzicha, J.: Latent class models for collaborative filtering. In: IJCAI, vol. 99, pp. 688-693, Stockholm, Sweden. Morgan Kaufmann (1999)
[26] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 1, 193-218 (1985) · doi:10.1007/BF01908075
[27] Keribin, C.; Brault, V.; Celeux, G.; Govaert, G., Estimation and selection for the latent block model on categorical data, Stat. Comput., 25, 6, 1201-1216 (2015) · Zbl 1331.62149 · doi:10.1007/s11222-014-9472-2
[28] Le, Q.; Mikolov, T., Distributed representations of sentences and documents, Int. Conf. Mach. Learn., 32, 1188-1196 (2014)
[29] Lee, DD; Seung, HS, Algorithms for non-negative matrix factorization, Adv. Neural Inf. Process. Syst., 23, 556-562 (2001)
[30] Madeira, SC; Oliveira, AL, Biclustering algorithms for biological data analysis: a survey, IEEE/ACM Trans. Comput. Biol. Bioinform., 1, 1, 24-45 (2004) · doi:10.1109/TCBB.2004.2
[31] Marcotorchino, F., Seriation problems: an overview, Appl. Stoch. Models Data Anal., 7, 2, 139-151 (1991) · doi:10.1002/asm.3150070204
[32] Mardia, KV; Jupp, PE, Directional Statistics (2009), New York: Wiley, New York · Zbl 0935.62065
[33] McLachlan, GJ; Peel, D., Finite Mixture Models (2004), New York: Wiley, New York · Zbl 0963.62061
[34] Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st international conference on learning representations, Arizona, USA. ICLR (2013)
[35] Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: Australasian document computing symposium, IEEE (2009)
[36] Rocci, R.; Vichi, M., Two-mode multi-partitioning, Comput. Stat. Data Anal., 52, 4, 1984-2003 (2008) · Zbl 1452.62463 · doi:10.1016/j.csda.2007.06.025
[37] Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the 8th ACM international conference on Web search and data mining, Shanghai, China, pp. 399-408 (2015)
[38] Role, F.; Morbieu, S.; Nadif, M., Coclust: a python package for co-clustering, J. Stat. Softw. Artic., 88, 7, 1-29 (2019)
[39] Role, F., Nadif, M.: Handling the impact of low frequency events on co-occurrence based measures of word similarity. In: Proceedings of the international conference on Knowledge Discovery and Information Retrieval (KDIR-2011). Scitepress, pp. 218-223 (2011)
[40] Salah, A.; Ailem, M.; Nadif, M., Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clustering, AAAI Conf. Artif. Intell., 32, 3292-3299 (2018)
[41] Salah, A., Nadif, M.: Model-based von Mises-Fisher co-clustering with a conscience. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM , pp. 246-254 (2017a)
[42] Salah, A.; Nadif, M., Social regularized von Mises-Fisher mixture model for item recommendation, Data Min. Knowl. Discov., 31, 5, 1218-1241 (2017) · Zbl 1411.68153 · doi:10.1007/s10618-017-0499-9
[43] Salah, A.; Nadif, M., Directional co-clustering, Adv. Data Anal. Classif., 13, 3, 591-620 (2019) · Zbl 1474.62244 · doi:10.1007/s11634-018-0323-4
[44] Schwarz, G., Estimating the dimension of a model, Ann. Stat., 6, 2, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[45] Steinley, D., Properties of the Bubert-arable adjusted rand index, Psychol. Methods, 9, 3, 386 (2004) · doi:10.1037/1082-989X.9.3.386
[46] Strehl, A.; Ghosh, J., Cluster ensembles-a knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res., 3, 583-617 (2003) · Zbl 1084.68759
[47] Tanay, A.; Sharan, R.; Shamir, R., Biclustering algorithms: a survey, Handb. Comput. Mol. Biol., 9, 1-20, 122-124 (2005)
[48] Van Mechelen, I.; Bock, H-H; De Boeck, P., Two-mode clustering methods: astructuredoverview, Stat. Methods Med. Res., 13, 5, 363-394 (2004) · Zbl 1053.62078 · doi:10.1191/0962280204sm373ra
[49] Vichi, M., Double k-means clustering for simultaneous classification of objects and variables, Adv. Classif. Data Anal. (2001) · doi:10.1007/978-3-642-59471-7_6
[50] Wang, H., Nie, F., Huang, H., Makedon, F.: Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In: 22nd international joint conference on artificial intelligence (2011)
[51] Zhong, S.; Ghosh, J., Generative model-based document clustering: a comparative study, Knowl. Inf. Syst., 8, 3, 374-384 (2005) · doi:10.1007/s10115-004-0194-1
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.