×

Efficient regularized spectral data embedding. (English) Zbl 07363867

Summary: Data embedding (DE) or dimensionality reduction techniques are particularly well suited to embedding high-dimensional data into a space that in most cases will have just two dimensions. Low-dimensional space, in which data samples (data points) can more easily be visualized, is also often used for learning methods such as clustering. Sometimes, however, DE will identify dimensions that contribute little in terms of the clustering structures that they reveal. In this paper we look at regularized data embedding by clustering, and we propose a simultaneous learning approach for DE and clustering that reinforces the relationships between these two tasks. Our approach is based on a matrix decomposition technique for learning a spectral DE, a cluster membership matrix, and a rotation matrix that closely maps out the continuous spectral embedding, in order to obtain a good clustering solution. We compare our approach with some traditional clustering methods and perform numerical experiments on a collection of benchmark datasets to demonstrate its potential.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

CoClust; darch
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Affeldt S, Labiod L, Nadif M (2019) Spectral clustering via ensemble deep autoencoder learning (SC-EDAE). arXiv:1901.02291
[2] Ailem, M.; Role, F.; Nadif, M., Graph modularity maximization as an effective method for co-clustering text data, Knowl Based Syst, 109, 160-173 (2016) · doi:10.1016/j.knosys.2016.07.002
[3] Bach, FR; Jordan, MI, Learning spectral clustering, with application to speech separation, J Mach Learn Res, 7, 1963-2001 (2006) · Zbl 1222.68138
[4] Banijamali E, Ghodsi A (2017) Fast spectral clustering using autoencoders and landmarks. In: International conference image analysis and recognition, Springer, pp 380-388
[5] Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis. In: Functional genomics, Springer, pp 159-182
[6] Bock HH (1987) On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: Multivariate statistical modeling and data analysis, Springer, pp 17-34 · Zbl 0627.62068
[7] Boutsidis C, Kambadur P, Gittens A (2015) Spectral clustering via the power method-provably. In: International conference on machine learning, pp 40-48
[8] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognit, 28, 5, 781-793 (1995) · doi:10.1016/0031-3203(94)00125-6
[9] Chan, PK; Schlag, MD; Zien, JY, Spectral k-way ratio-cut partitioning and clustering, IEEE Trans Comput Aided Des Integr Circuits Syst, 13, 9, 1088-1096 (1994) · doi:10.1109/43.310898
[10] Chang, W., On using principal components before separating a mixture of two multivariate normal distributions, Appl Stat, 32, 267-275 (1983) · Zbl 0538.62050 · doi:10.2307/2347949
[11] Chen X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Twenty-fifth AAAI conference on artificial intelligence, pp 313-318
[12] Chen, W.; Song, Y.; Bai, H.; Lin, C.; Chang, E., Parallel spectral clustering in distributed systems, IEEE Trans Pattern Anal Mach Intell, 33, 568-586 (2011) · doi:10.1109/TPAMI.2010.88
[13] De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: New approaches in classification and data analysis, Springer, pp 212-219
[14] Dhillon I, Guan Y, Kulis B (2004) Kernel k-means, spectral clustering and normalized cuts. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 551-556
[15] Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of the 24th international conference on machine learning, ACM, pp 521-528
[16] Ding C, He X, Zha H, Gu M, Simon H (2001) A min max cut algorithm for graph partitioning and data clustering. In: IEEE international conference on data mining (ICDM), pp 107-114
[17] Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining, SIAM, pp 606-610
[18] Ding C, Li T, Jordan M (2008) Nonnegative matrix factorization for combinatorial optimization: spectral clustering, graph matching, and clique finding. In: IEEE international conference on data mining (ICDM), pp 183-192
[19] Engel D, Hüttenberger L, Hamann B (2012) A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: OAIS open access series in informatics, Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, vol 27, pp 135-149
[20] Fowlkes, C.; Belongie, S.; Chung, F.; Malik, J., Spectral grouping using the nystrom method, IEEE Trans Pattern Anal Mach Intell, 26, 2, 214-225 (2004) · doi:10.1109/TPAMI.2004.1262185
[21] Gattone, S.; Rocci, R., Clustering curves on a reduced subspace, J Comput Gr Stat, 21, 2, 361-379 (2012) · doi:10.1080/10618600.2012.679237
[22] Gittins R (1985) Canonical analysis a review with applications in ecology. In: Biomathematics, vol 12, Springer, Berlin · Zbl 0576.62069
[23] Golub, G.; Loan, CV, Matrix computations (1996), Baltimore: Johns Hopkins University Press, Baltimore · Zbl 0865.65009
[24] Govaert, G.; Nadif, M., Co-clustering: models, algorithms and applications (2013), New York: Wiley, New York · Zbl 0910.62021 · doi:10.1002/9781118649480
[25] Govaert, G.; Nadif, M., Mutual information, phi-squared and model-based co-clustering for contingency tables, Adv Data Anal Classif, 12, 3, 455-488 (2018) · Zbl 1416.62309 · doi:10.1007/s11634-016-0274-6
[26] Hinton, G.; Salakhutdinov, R., Reducing the dimensionality of data with neural networks, Science, 313, 5786, 504-507 (2006) · Zbl 1226.68083 · doi:10.1126/science.1127647
[27] Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30, pp 24-33
[28] Lee H, Battle A, Raina R, Ng A (2007) Efficient sparse coding algorithms. In: Advances in neural information processing systems (NIPS), pp 801-808
[29] Leyli-Abadi M, Labiod L, Nadif M (2017) Denoising autoencoder as an effective dimensionality reduction and clustering of text data. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 801-813
[30] Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 679-686
[31] Luo, D.; Huang, H.; Ding, C.; Nie, F., On the eigenvectors of p-laplacian, J Mach Learn, 81, 1, 37-51 (2010) · Zbl 1470.68141 · doi:10.1007/s10994-010-5201-z
[32] Luxburg, U., A tutorial on spectral clustering, Stat Comput, 17, 4, 395-416 (2007) · doi:10.1007/s11222-007-9033-z
[33] Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems (NIPS), pp 849-856
[34] Nie F, Ding C, Luo D, Huang H (2010) Improved minmax cut graph clustering with nonnegative relaxation. In: European conference on machine learning and practice of knowledge discovery in databases (ECML/PKDD), vol 6322, pp 451-466
[35] Role, F.; Morbieu, S.; Nadif, M., Coclust: a python package for co-clustering, J Stat Softw, 88, 7, 1-29 (2019) · doi:10.18637/jss.v088.i07
[36] Salah A, Nadif M (2017) Model-based von mises-fisher co-clustering with a conscience. In: Proceedings of the 2017 SIAM international conference on data mining, SIAM, pp 246-254
[37] Salah, A.; Nadif, M., Directional co-clustering, Adv Data Anal Classif, 13, 3, 591-620 (2019) · Zbl 1474.62244 · doi:10.1007/s11634-018-0323-4
[38] Schölkopf B, Smola A, Müller KR (1997) Kernel principal component analysis. In: International conference on artificial neural networks. Lausanne, Switzerland, Springer, pp 583-588
[39] Schonemann, P., A generalized solution of the orthogonal procrustes problem, Psychometrika, 31, 1, 1-10 (1966) · Zbl 0147.19401 · doi:10.1007/BF02289451
[40] Scrucca, L., Dimension reduction for model-based clustering, Stat Comput, 20, 4, 471-484 (2010) · doi:10.1007/s11222-009-9138-7
[41] Seuret M, Alberti M, Liwicki M, Ingold R (2017) Pca-initialized deep neural networks applied to document image analysis. In: 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, November 9-15, 2017, pp 877-882
[42] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE Trans Pattern Anal Mach Intell, 22, 8, 888-905 (2000) · doi:10.1109/34.868688
[43] Shinnou H, Sasaki M (2008) Spectral clustering for a large data set by reducing the similarity matrix size. In: Proceedings of the sixth international conference on language resources and evaluation (LREC), pp 201-2014
[44] Strehl, A.; Ghosh, J., Cluster ensembles:a knowledge reuse framework for combining multiple partitions, J Mach Learn Res, 3, 583-617 (2002) · Zbl 1084.68759
[45] ten Berge, JM, Least squares optimization in multivariate analysis (1993), Leiden: DSWO Press, Leiden
[46] Tian K, Zhou S, Guan J (2017) Deepcluster: a general clustering framework based on deep learning. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases
[47] Van Der Maaten, L.; Postma, E.; Van den Herik, J., Dimensionality reduction: a comparative, J Mach Learn Res, 10, 66-71 (2009)
[48] Vichi, M.; Kiers, H., Factorial k-means analysis for two-way data, Comput Stat Data Anal, 37, 1, 49-64 (2001) · Zbl 1051.62056 · doi:10.1016/S0167-9473(00)00064-5
[49] Vichi, M.; Saporta, G., Clustering and disjoint principal component analysis, Comput Stat Data Anal, 53, 8, 3194-3208 (2009) · Zbl 1453.62230 · doi:10.1016/j.csda.2008.05.028
[50] Vidal, R., Subspace clustering, IEEE Signal Process Mag, 28, 2, 52-68 (2011) · doi:10.1109/MSP.2010.939739
[51] Wang S, Ding Z, Fu Y (2017) Feature selection guided auto-encoder. In: Thirty-first conference on artificial intelligence (AAAI), pp 2725-2731
[52] Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478-487
[53] Yamamoto, M., Clustering of functional data in a low-dimensional subspace, Adv Data Anal Classif, 6, 3, 219-247 (2012) · Zbl 1254.62077 · doi:10.1007/s11634-012-0113-3
[54] Yamamoto, M.; Hwang, H., A general formulation of cluster analysis with dimension reduction and subspace separation, Behaviormetrika, 41, 1, 115-129 (2014) · doi:10.2333/bhmk.41.115
[55] Yang L, Cao X, He D, Wang C, Wang X, Zhang W (2016) Modularity based community detection with deep learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI), pp 2252-2258
[56] Yang B, Fu X, Sidiropoulos N, Hong M (2017) Towards k-means-friendly spaces: simultaneous deep learning and clustering. In: Proceedings of the 34th international conference on machine learning (ICML), pp 3861-3870
[57] Yan D, Huang L, Jordan M (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 907-916
[58] Ye J, Zhao Z, Wu M (2008) Discriminative k-means for clustering. In: Advances in neural information processing systems, pp 1649-1656
[59] Yuan, Z.; Yang, Z.; Oja, E., Projective nonnegative matrix factorization: sparseness, orthogonality, and clustering, Neural Process Lett, 2009, 11-13 (2009)
[60] Zha H, He X, Ding C, Simon H, Gu M (2002) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems (NIPS), MIT Press, pp 1057-1064
[61] Zhirong, Z.; Laaksonen, J., Projective nonnegative matrix factorization with applications to facial image processing, J Pattern Recognit Artif Intell, 21, 8, 1353-1362 (2007) · doi:10.1142/S0218001407005983
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.