×

Self-expanded clustering algorithm based on density units with evaluation feedback section. (English) Zbl 05162989

Summary: This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. The higher mark the clustering result gains, the higher quality it has. By organizing two modes in different ways, we can build two clustering algorithms: SECDU (Self-Expanded Clustering Algorithm based on Density Units) and SECDUF (Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section). SECDU enumerates all value pairs of two parameters of clustering mode to process data set repeatedly and evaluates every clustering result by evaluating mode. Then SECDU output the clustering result that has the highest evaluating mark among all the ones. By applying “hill-climbing algorithm”, SECDUF improves clustering efficiency greatly. Data sets that have different distribution features can be well adapted to both algorithms. SECDU and SECDUF can output high-quality clustering results. SECDUF tunes parameters of clustering mode automatically and no man’s action involves through the whole process. In addition, SECDUF has a high clustering performance.

MSC:

62-XX Statistics
68-XX Computer science
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Macqueen J..K-Means: Some Methods for Classification and Analysis of Multivariate Observations[C]//The 5th Berkeley Symp on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967:281–297.
[2] Breunig M, Kriegel H, Kroger P. Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering[C] //ACM SICGMOD., Santa Barbara: ACM, 2001: 79–90.
[3] Nassar S, Sander J, Cheng Corrine. Incremental and Effective Data Summarization for Dynamic Hierarchical Clustering[C] //ACM SICGMOD. Paris: ACM, 2004: 13–18.
[4] Guha S, Rastogi R, Shim K. CURE: An Efficient Clustering Algorithm for Large Databases[C]//ACM SICGMOD. Washington: ACM, 1998: 73–84.
[5] Zhang T, Ramakrishnan R, Livny M. BIRCH: An Efficient Data Clustering Method for Very Large Databases[C]//ACM SICGMOD. Montreal: ACM Press, 1996: 103–114.
[6] Ankerst M, Breunig M, Kriegel H P. OPTICS: Ordering Points To Identify the Clustering Structure [C]//ACM SICGMOD. Pennsylvania: ACM, 1999: 49–60.
[7] Sander J. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications[J].Data Mining and Konwledge Discovery, 1998,2(2):169–194. · Zbl 05470544
[8] Halkidi M, Batistakis Y, Vazirgiannis M. Clustering Algorithms and Validity Measures[C]//Proceedings of the Thirteenth International Conference on Scientific and Statistical Database Management. Virginia: IEEE, 2001: 18–20. · Zbl 0998.68154
[9] Yeung K Y, Haynor D R, Ruzzo W L. Validating Clustering for Gene Expression Data[J].Bioinformatics., 2001,17(4): 309–318.
[10] Tibshirani R, Walther G, Brown P.Clustering Validation by Prediction Strength[M]. Stanford: Stanford University, 2001.
[11] Shi Yong, Song Yuqing, Zhang Aidong. A Shrinking-Based Approach for Multi-Dimensional Data Analysis [C]//Proceedings of the 29th VLDB Conference. Berlin: ACM, 2003: 440–451.
[12] Liu Xiaoying, Wang Guoren. Study on Clustering Algorithm Based on Spatial Unit Density[J].Mini-Micro Systems, 2005,26(10):55–60.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.