×

Incremental spectral clustering by efficiently updating the eigen-system. (English) Zbl 1176.68186

Summary: In recent years, the spectral clustering method has gained attentions because of its superior performance. To the best of our knowledge, the existing spectral clustering algorithms cannot incrementally update the clustering results given a small change of the data set. However, the capability of incrementally updating is essential to some applications such as websphere or blogsphere. Unlike the traditional stream data, these applications require incremental algorithms to handle not only insertion/deletion of data points but also similarity changes between existing points.
In this paper, we extend the standard spectral clustering to such evolving data, by introducing the incidence vector/matrix to represent two kinds of dynamics in the same framework and by incrementally updating the eigen-system. Our incremental algorithm, initialized by a standard spectral clustering, continuously and efficiently updates the eigenvalue system and generates instant cluster labels, as the data set is evolving. The algorithm is applied to a blog data set. Compared with recomputation of the solution by the standard spectral clustering, it achieves similar accuracy but with much lower computational cost. It can discover not only the stable blog communities but also the evolution of the individual multi-topic blogs. The core technique of incrementally updating the eigenvalue system is a general algorithm and has a wide range of applications-as well as incremental spectral clustering-where dynamic graphs are involved. This demonstrates the wide applicability of our incremental algorithm.

MSC:

68T10 Pattern recognition, speech recognition
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Ng, A.Y.; Jordan, M.I.; Weiss, Y., On spectral clustering: analysis and an algorithm, (), 849-856
[2] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE transactions on pattern analysis and machine intelligence, 22, 8, 888-905, (2000)
[3] X. Zhu, Semi-supervised learning literature survey (1530). URL \(\langle\)http://www.cs.wisc.edu/∼jerryzhu/pub/ssl_survey.pdf⟩.
[4] C. Gupta, R. Grossman, Genic: a single pass generalized incremental algorithm for clustering, in: Proceedings of the SIAM International Conference on Data Mining, 2004, pp. 137-153.
[5] S. Guha, N. Mishra, R. Motwani, L. O’Callaghan, Clustering data streams, in: Proceedings of the IEEE Annual Symposium on Foundations of Computer Science, 2000, p. 359.
[6] M. Charikar, C. Chekuri, T. Feder, R. Motwani, Incremental clustering and dynamic information retrieval, in: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, 1997, pp. 626-635. · Zbl 0963.68062
[7] Bollobas, B., Modern graph theory, (1998), Springer New York · Zbl 0902.05016
[8] Golub, G.H.; Loan, C.F.V., Matrix computations, (1989), John Hopkins Press Baltimore, MD
[9] Fiedler, M., Algebraic connectivity of graphs, Czechoslovak mathematical journal, 23, 98, 298-305, (1973) · Zbl 0265.05119
[10] H. Ning, W. Xu, Y. Chi, Y. Gong, T. Huang, Incremental spectral clustering with application to monitoring of evolving blog communities, in: Proceedings of the SIAM International Conference on Data Mining, 2007, pp. 261-272.
[11] C. Ding, A tutorial on spectral clustering, in: Proceedings of the International Conference on Machine Learning, 2004.
[12] Y.C. Wei, C.K. Cheng, Towards efficient hierarchical designs by ratio cut partitioning, in: Proceedings of the International Conference on Computer Aided Design, 1989, pp. 298-301.
[13] Hagen, L.; Kahng, A.B., New spectral methods for ratio cut partitioning and clustering, IEEE transactions on computer – aided design, 11, 9, 1074-1085, (1992)
[14] C. Ding, X. He, H. Zha, M. Gu, H. Simon, A min – max cut algorithm for graph partitioning and data clustering, in: Proceedings of the IEEE International Conference on Data Mining, 2001, pp. 107-114.
[15] D. Verma, M. Meila, A comparison of spectral clustering algorithms, Technical Report UW-CSE-03-05-01, University of Washington, Seattle, WA, 2003.
[16] C. Valgren, T. Duckett, A. Lilienthal, Incremental spectral clustering and its application to topological mapping, in: Proceedings of the IEEE International Conference on Robotics and Automation, 2007, pp. 4283-4288.
[17] C.C. Aggarwal, J. Han, J. Wang, P.S. Yu, A framework for clustering evolving data streams, in: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB Endowment, 2003, pp. 81-92.
[18] G.S. Manku, S. Rajagopalan, B.G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998, pp. 426-435.
[19] G.S. Manku, S. Rajagopalan, B.G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1999, pp. 251-262.
[20] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, R. Motwani, Streaming-data algorithms for high-quality clustering, in: Proceedings of the International Conference on Data Engineering, 2002, pp. 685-694.
[21] P. Desikan, N. Pathak, J. Srivastava, V. Kumar, Incremental page rank computation on evolving graphs, in: Proceedings of the International World Wide Web Conference, 2005, pp. 1094-1095.
[22] A.N. Langville, C.D. Meyer, Updating pagerank with iterative aggregation, in: Proceedings of the International World Wide Web Conference, 2004, pp. 392-393.
[23] D. Chakrabarti, R. Kumar, A. Tomkins, Evolutionary clustering, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 554-560.
[24] Y. Chi, X. Song, D. Zhou, K. Hino, B. Tseng, Evolutionary spectral clustering by incorporating temporal smoothness, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, pp. 153-162.
[25] Yu, S.X.; Shi, J., Segmentation given partial grouping constraints, IEEE transactions on pattern analysis and machine intelligence, 26, 2, 173-183, (2004)
[26] F.R.K. Chung, Spectral Graph Theory, in: CBMS Regional Conference Series in Mathematics, vol. 92, American Mathematical Society, Providence, RI, 1997. · Zbl 0867.05046
[27] Zhou, A.; Cao, F.; Qian, W.; Jin, C., Tracking clusters in evolving data streams over sliding windows, Knowledge and information systems, 15, 2, 181-214, (2008)
[28] H. Ning, M. Liu, H. Tang, T. Huang, A spectral clustering approach to speaker diarization, in: Proceedings of the International Conference on Spoken Language Processing, 2006.
[29] L. Zelnik-Manor, P. Perona, Self-tuning spectral clustering, in: Proceedings of the Advances in Neural Information Processing Systems, 2005, pp. 1601-1608.
[30] A. Ghosh, S. Boyd, Growing well-connected graphs, in: Proceedings of the 45th IEEE Conference on Decision and Control, 2006, pp. 6605-6611.
[31] Kim, Y.; Mesbahi, M., On maximizing the second smallest eigenvalue of a state dependent graph Laplacian, IEEE transactions on automatic control, 51, 1, 116-120, (2006) · Zbl 1366.05069
[32] Rheingold, H., The virtual community: homesteading on the electronic frontier, (2000), The MIT Press Cambridge, MA
[33] Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, B. Tseng, Discovery of blog communities based on mutual awareness, in: Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem, 2006.
[34] W. Xu, X. Liu, Y. Gong, Document clustering based on non-negative matrix factorization, in: Proceedings of the ACM SIGIR Special Interest Group on Information Retrieval, 2003, pp. 267-273.
[35] Lovasz, L.; Plummer, M., Matching theory, akademiai kiado, (1986), North-Holland Budapest
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.