×

Learning representations from dendrograms. (English) Zbl 1524.68264

Summary: We propose unsupervised representation learning and feature extraction from dendrograms. The commonly used Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures and representations can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62A09 Graphical methods in statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aho, AV; Hopcroft, JE, The design and analysis of computer algorithms (1974), Boston: Addison-Wesley Longman Publishing Co., Boston · Zbl 0326.68005
[2] Bansal, N.; Blum, A.; Chawla, S., Correlation clustering, Machine Learning, 56, 1-3, 89-113 (2004) · Zbl 1089.68085 · doi:10.1023/B:MACH.0000033116.57574.95
[3] Chang, H.; Yeung, D-Y, Robust path-based spectral clustering, Pattern Recognition, 41, 1, 191-203 (2008) · Zbl 1122.68525 · doi:10.1016/j.patcog.2007.04.010
[4] Charikar, M., Guruswami, V., & Wirth, A. (2003). Clustering with qualitative information. In 44th Symposium on foundations of computer science FOCS (pp. 524-533). · Zbl 1094.68075
[5] Chebotarev, P., A class of graph-geodetic distances generalizing the shortest-path and the resistance distances, Discrete Applied Mathematics, 159, 5, 295-302 (2011) · Zbl 1209.05068 · doi:10.1016/j.dam.2010.11.017
[6] Chehreghani, MH, Adaptive trajectory analysis of replicator dynamics for data clustering, Machine Learning, 104, 2-3, 271-289 (2016) · Zbl 1419.62149 · doi:10.1007/s10994-016-5573-9
[7] Chehreghani, M. H. (2016b). K-nearest neighbor search and outlier detection via minimax distances. In SDM ‘16 (pp. 405-413).
[8] Chehreghani, M. H. (2017a). Feature-oriented analysis of user profile completion problem. In 39th European conference on information retrieval (ECIR) (pp. 304-316).
[9] Chehreghani, M. H. (2017b). Classification with minimax distances. In Thirty-first AAAI conference on artificial intelligence (AAAI).
[10] Chehreghani, M. H. (2017c). Clustering by shift. In IEEE international conference on data mining, ICDM (pp. 793-798).
[11] Chehreghani, M. H. (2017d). Efficient computation of pairwise minimax distance measures. In IEEE international conference on data mining, ICDM (pp. 799-804).
[12] Chehreghani, M. H. (2020). Unsupervised representation learning with minimax distance measures. Machine Learning. 10.1007/s10994-020-05886-4.
[13] Chehreghani, M. H., Rahgozar, M., Lucas, C., & Chehreghani, M. H. (2007) Mining maximal embedded unordered tree patterns. In Proceedings of the IEEE symposium on computational intelligence and data mining, CIDM (pp. 437-443).
[14] Chehreghani, MH; Chehreghani, MH; Lucas, C.; Rahgozar, M., Oinduced: An efficient algorithm for mining induced patterns from rooted ordered trees, IEEE Transactions on Systems, Man, and Cybernetics Part A, 41, 5, 1013-1025 (2011) · doi:10.1109/TSMCA.2010.2096808
[15] Chehreghani, M. H., Busetto, A. G., & Buhmann, J. M. (2012). Information theoretic model validation for spectral clustering. In Fifteenth international conference on artificial intelligence and statistics (AISTATS) (pp. 495-503).
[16] Demaine, ED; Emanuel, D.; Fiat, A.; Immorlica, N., Correlation clustering in general weighted graphs, Theoretical Computer Science, 361, 2-3, 172-187 (2006) · Zbl 1099.68074 · doi:10.1016/j.tcs.2006.05.008
[17] Deza, M.; Laurent, M., Applications of cut polyhedra-i, Journal of Computational and Applied Mathematics, 55, 2, 191-216 (1994) · Zbl 0826.52012 · doi:10.1016/0377-0427(94)90020-5
[18] Dhillon, I. S., Guan, Y., & Kulis, B. (2004). Kernel k-means: Spectral clustering and normalized cuts. In ACM KDD ’04 (pp. 551-556). ACM.
[19] Dhillon, I. S., Guan, Y., & Kulis, B. (2005). A unified view of kernel k-means, spectral clustering and graph cuts. Technical Report TR-04-25.
[20] Farnia, F., & Tse, D. (2016). A minimax approach to supervised learning. In NIPS ‘16 (pp. 4233-4241).
[21] Fischer, B.; Buhmann, JM, Path-based clustering for grouping of smooth curves and texture segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 4, 513-518 (2003) · doi:10.1109/TPAMI.2003.1190577
[22] Fischer, B., Roth, V., & Buhmann, J. M. (2003). Clustering with the connectivity kernel. In NIPS ‘03 (pp. 89-96).
[23] Fouss, F.; Francoisse, K.; Yen, L.; Pirotte, A.; Saerens, M., An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification, Neural Networks, 31, 5372 (2012) · Zbl 1245.68155 · doi:10.1016/j.neunet.2012.03.001
[24] Gower, JC; Ross, GJS, Minimum spanning trees and single linkage cluster analysis, Journal of the Royal Statistical Society, 18, 54-64 (1969)
[25] Hofmann, T., Schölkopf, B., & Smola, A. J. (2008). Kernel methods in machine learning. Annals of Statistics,36(3), 1171-1220. · Zbl 1151.30007
[26] Hu, TC, The maximum capacity route problem, Operations Research, 9, 898-900 (1961) · doi:10.1287/opre.9.6.898
[27] Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 1, 193-218 (1985) · doi:10.1007/BF01908075
[28] Kim, K.-H., & Choi, S. (2007). Neighbor search with global geometry: A minimax message passing algorithm. In ICML (pp. 401-408).
[29] Kim, K.-H., & Choi, S. (2013). Walking on minimax paths for k-nn search. In AAAI.
[30] Kolar, M., Balakrishnan, S., Rinaldo, A., & Singh, A. (2011). Minimax localization of structural information in large noisy matrices. In NIPS ‘11 (pp. 909-917).
[31] Kschischang, FR; Frey, BJ; Loeliger, HA, Factor graphs and the sum-product algorithm, IEEE Transactions on Information Theory, 47, 2, 498-519 (2006) · Zbl 0998.68234 · doi:10.1109/18.910572
[32] Lance, GN; Williams, WT, A general theory of classificatory sorting strategies 1. Hierarchical systems, The Computer Journal, 9, 4, 373-380 (1967) · doi:10.1093/comjnl/9.4.373
[33] LeCun, Y.; Bengio, Y.; Hinton, GE, Deep learning, Nature, 521, 7553, 436-444 (2015) · doi:10.1038/nature14539
[34] Li, T., Yi, X., Carmanis, C., & Ravikumar, P. (2017). Minimax Gaussian classification and clustering. In A. Singh & J. Zhu (Eds.), AISTATS ‘17 (Vol. 54, pp. 1-9).
[35] Liu, Q., & Zhang, R. (2019) Global optimal path-based clustering algorithm. CoRR, arXiv:1909.07774.
[36] Macqueen, J. (1967). Some methods for classification and analysis of multivariate observations. In In 5th Berkeley symposium on mathematical statistics and probability (pp. 281-297). · Zbl 0214.46201
[37] Mathieu, C., & Schudy, W. (2010) Correlation clustering with noisy input. In M. Charikar (Ed.), Proceedings of the twenty-first annual ACM-SIAM symposium on discrete algorithms, SODA (pp. 712-728). · Zbl 1288.68197
[38] Moseley, B., & Wang, J. (2017) Approximation bounds for hierarchical clustering: Average linkage, bisecting k-means, and local search. In Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017 (pp. 3094-3103).
[39] Nadler, B.; Galun, M., Fundamental limitations of spectral clustering, Advanced in Neural Information Processing Systems, 19, 1017-1024 (2007)
[40] Pavan, M.; Pelillo, M., Dominant sets and pairwise clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1, 167-172 (2007) · doi:10.1109/TPAMI.2007.250608
[41] Rosenberg, A., & Hirschberg, J. (2007). V-measure: A conditional entropy-based external cluster evaluation measure. In EMNLP-CoNLL (pp. 410-420). ACL.
[42] Roth, V.; Laub, J.; Kawanabe, M.; Buhmann, JM, Optimal cluster preserving embedding of nonmetric proximity data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 12, 1540-1551 (2003) · doi:10.1109/TPAMI.2003.1251147
[43] Schölkopf, B.; Smola, A.; Müller, K-R, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computing, 10, 5, 1299-1319 (1998) · doi:10.1162/089976698300017467
[44] Shawe-Taylor, J. & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge University Press. · Zbl 0994.68074
[45] Shieh, A., Hashimoto, T. B., & Airoldi, E. M. (2011a). Tree preserving embedding. In Proceedings of the 28th international conference on machine learning, ICML (pp. 753-760).
[46] Shieh, AD; Hashimoto, TB; Airoldi, EM, Tree preserving embedding, Proceedings of the National Academy of Sciences, 108, 41, 16916-16921 (2011) · doi:10.1073/pnas.1018393108
[47] Sneath, PHA, The application of computers to taxonomy, Journal of General Microbiology, 17, 201-226 (1957) · doi:10.1099/00221287-17-1-184
[48] Sokal, RR; Michener, CD, A statistical method for evaluating systematic relationships, University of Kansas Science Bulletin, 38, 1409-1438 (1958)
[49] Sorensen, T. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biologiske Skrifter: Det Kongelige Danske Videnskabernes Selskab. I kommission hos E. Munksgaard.
[50] Thiel, E., Chehreghani, M. H., & Dubhashi, D. P. (2019). A non-convex optimization approach to correlation clustering. In Thirty-third AAAI conference on artificial intelligence (AAAI) (pp. 5159-5166).
[51] Torgerson, WS, Theory and methods of scaling (1958), Hoboken: Wiley, Hoboken
[52] Vinh, NX; Epps, J.; Bailey, J., Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance, Journal of Machine Learning Research, 11, 2837-2854 (2010) · Zbl 1242.62062
[53] von Luxburg, U., A tutorial on spectral clustering, Statistics and Computing, 17, 4, 395-416 (2007) · doi:10.1007/s11222-007-9033-z
[54] Ward, JH, Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, 301, 236-244 (1963) · doi:10.1080/01621459.1963.10500845
[55] Yen, L., et al. (2008). A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In KDD (pp. 785-793).
[56] Young, G.; Householder, A., Discussion of a set of points in terms of their mutual distances, Psychometrika, 3, 1, 19-22 (1938) · JFM 64.1302.04 · doi:10.1007/BF02287916
[57] Yu, Z., Xu, C., Meng, D., Hui, Z., Xiao, F., Liu, W., & Liu, J. (2014). Transitive distance clustering with k-means duality. In 2014 IEEE conference on computer vision and pattern recognition, CVPR (pp. 987-994).
[58] Zhong, C.; Malinen, MI; Miao, D.; Fränti, P., A fast minimum spanning tree algorithm based on k-means, Information Science, 295, 1-17 (2015) · Zbl 1360.68730 · doi:10.1016/j.ins.2014.10.012
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.