Social network optimization for cluster ensemble selection. (English) Zbl 07350044

Summary: This paper studies the cluster ensemble selection problem for unsupervised learning. Given a large ensemble of clustering solutions, our goal is to select a subset of solutions to form a smaller yet better performing cluster ensemble than using all available solutions. The common way of aggregating the chosen solutions is accumulating the information of the selected results to a similarity matrix. This paper suggests transforming the similarity matrix to a modularity matrix and then applying a new consensus function which optimizes modularity measure in it. We represent the modularity maximization problem as a 0-1 quadratic program which can be exactly solved for small datasets. We also established a new greedy algorithm, namely sum linkage, to optimize the objective function specially for large scale datasets in a very short time. We show that the proposed consensus partition gets much closer to the actual cluster structure than the partitions obtained from the direct application of common cluster ensemble methods. The promising results compared with other most cited consensus functions show the excellent efficiency of the proposed method.


68-XX Computer science
Full Text: DOI


[1] Jain AK. Data clustering: 50 years beyond K-means.Pattern Recognit. Lett., 2010.31(8):651-666. doi:10.1016/j.patrec.2009.09.011. URLhttps://doi.org/10.1016/j.patrec.2009.09.011.
[2] Alizadeh H, Minaei-Bidgoli B, Parvin H. Optimizing Fuzzy Cluster Ensemble in String Representation. Int. J. Pattern Recognit. Artif. Intell., 2013.27(2). doi:10.1142/S0218001413500055. URLhttps: //doi.org/10.1142/S0218001413500055. · Zbl 1267.62076
[3] Strehl A, Ghosh J. Cluster Ensembles — A Knowledge Reuse Framework for Combining Multiple Partitions.J. Mach. Learn. Res., 2002.3:583-617. URLhttp://jmlr.org/papers/v3/strehl02a.html. · Zbl 1084.68759
[4] Alizadeh H, Minaei-Bidgoli B, Parvin H. Cluster ensemble selection based on a new cluster stability measure.Intell. Data Anal., 2014.18(3):389-408. doi:10.3233/IDA-140647. URLhttps://doi.org/ 10.3233/IDA-140647.
[5] Fern XZ, Lin W. Cluster Ensemble Selection.Statistical Analysis and Data Mining, 2008.1(3):128-141. doi:10.1002/sam.10008. URLhttps://doi.org/10.1002/sam.10008.
[6] Azimi J, Fern XZ. Adaptive Cluster Ensemble Selection. In: Boutilier C (ed.), IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009. 2009 pp. 992-997. URLhttp://ijcai.org/Proceedings/09/Papers/168.pdf.
[7] Abbasi S, Nejatian S, Parvin H, Rezaie V, Bagherifard K. Clustering ensemble selection considering quality and diversity.Artif. Intell. Rev., 2019.52(2):1311-1340. doi:10.1007/s10462-018-9642-2. URL https://doi.org/10.1007/s10462-018-9642-2.
[8] Alizadeh H, Minaei-Bidgoli B, Parvin H. To improve the quality of cluster ensembles by selecting a subset of base clusters.J. Exp. Theor. Artif. Intell., 2014.26(1):127-150. doi:10.1080/0952813X.2013.813974. URLhttps://doi.org/10.1080/0952813X.2013.813974.
[9] Agarwal G, Kempe D. Modularity-Maximizing Graph Communities via Mathematical Programming. Physics of Condensed Matter, 2007.66. doi:10.1140/epjb/e2008-00425-1. · Zbl 1188.90262
[10] Newman MEJ. Communities, modules and large-scale structure in networks, 2012.
[11] Fortunato S. Community detection in graphs.CoRR, 2009.abs/0906.0612.0906.0612, URLhttp: //arxiv.org/abs/0906.0612.
[12] Fred ALN, Jain AK. Combining Multiple Clusterings Using Evidence Accumulation.IEEE Trans. Pattern Anal. Mach. Intell., 2005.27(6):835-850. doi:10.1109/TPAMI.2005.113. URLhttps://doi.org/10. 1109/TPAMI.2005.113.
[13] Caruana R, Niculescu-Mizil A, Crew G, Ksikes A. Ensemble selection from libraries of models. In: Brodley CE (ed.), Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004, volume 69 ofACM International Conference Proceeding Series. ACM, 2004 doi:10.1145/1015330.1015432. URLhttps://doi.org/10.1145/1015330.1015432.
[14] Hadjitodorov ST, Kuncheva LI, Todorova LP. Moderate diversity for better cluster ensembles.Inf. Fusion, 2006.7(3):264-275. doi:10.1016/j.inffus.2005.01.008. URLhttps://doi.org/10.1016/j.inffus. 2005.01.008.
[15] Parvin H, Minaei-Bidgoli B. A clustering ensemble framework based on elite selection of weighted clusters.Adv. Data Anal. Classif., 2013.7(2):181-208.doi:10.1007/s11634-013-0130-x.URL https://doi.org/10.1007/s11634-013-0130-x. · Zbl 1267.62076
[16] Singh V, Mukherjee L, Peng J, Xu J. Ensemble clustering using semidefinite programming with applications.Mach. Learn., 2010.79(1-2):177-200. doi:10.1007/s10994-009-5158-y. URLhttps: //doi.org/10.1007/s10994-009-5158-y.
[17] Rao PR, da Costa JPP. A performance study of a consensus clustering algorithm and properties of partition graph. In: 2010 IEEE International Conference on Computational Intelligence and Computing Research. 2010 pp. 1-5.
[18] Gu´enoche A. Consensus of partitions : a constructive approach.Adv. Data Anal. Classif., 2011.5(3):215- 229. doi:10.1007/s11634-011-0087-6. URLhttps://doi.org/10.1007/s11634-011-0087-6.
[19] Christou IT. Coordination of Cluster Ensembles via Exact Methods.IEEE Trans. Pattern Anal. Mach. Intell., 2011.33(2):279-293. doi:10.1109/TPAMI.2010.85. URLhttps://doi.org/10.1109/TPAMI. 2010.85.
[20] Vega-Pons S, Ruiz-Shulcloper J. A Survey of Clustering Ensemble Algorithms.Int. J. Pattern Recognit. Artif. Intell., 2011.25(3):337-372. doi:10.1142/S0218001411008683. URLhttps://doi.org/10. 1142/S0218001411008683.
[21] Brandes U, Delling D, Gaertler M, G¨orke R, Hoefer M, Nikoloski Z, Wagner D. On Modularity Clustering. IEEE Trans. Knowl. Data Eng., 2008.20(2):172-188. doi:10.1109/TKDE.2007.190689. URLhttps: //doi.org/10.1109/TKDE.2007.190689.
[22] Zhang X, Wang R, Wang Y, Wang J, Qiu Y, Wang L, Chen L. Modularity optimization in community detection of complex networks.EPL (Europhysics Letters), 2009.87:38002. doi:10.1209/0295-5075/87/ 38002.
[23] Zhang X, Li Z, Wang R, Wang Y. A combinatorial model and algorithm for globally searching community structure in complex networks.J. Comb. Optim., 2012.23(4):425-442. doi:10.1007/s10878-010-9356-0. URLhttps://doi.org/10.1007/s10878-010-9356-0. · Zbl 1245.90013
[24] Lancichinetti A, Fortunato S. Consensus clustering in complex networks.CoRR, 2012.abs/1203.6093. 1203.6093, URLhttp://arxiv.org/abs/1203.6093.
[25] Gambette P, Gu´enoche A. Bootstrap clustering for graph partitioning.RAIRO - Operations Research, 2011.45(4):339-352. doi:10.1051/ro/2012001. URLhttps://doi.org/10.1051/ro/2012001.
[26] Hosseinzadeh R, Alizadeh H, Nazemi E. Community Detection Ensemble in Social Networks. In: 11th Iranian Conf. on Intelligent Systems (ICIS13), Tehran. 2013 pp. 27-35.
[27] Newman MEJ, Girvan M.Finding and evaluating community structure in networks.Phys. Rev. E, 2004.69(2):026113. doi:10.1103/PhysRevE.69.026113. URLhttp://link.aps.org/doi/10.1103/ PhysRevE.69.026113.
[28] Breckenridge JN. Replicating Cluster Analysis: Method, Consistency, and Validity.Multivariate Behavioral Research, 1989.24(2):147-161. doi:10.1207/s15327906mbr2402\1. PMID: 26755276,https:// doi.org/10.1207/s15327906mbr2402_1, URLhttps://doi.org/10.1207/s15327906mbr2402_1.
[29] Roth V, Lange T, Braun ML, Buhmann JM. A Resampling Approach to Cluster Validation. In: H¨ardle WK, R¨onz B (eds.), COMPSTAT 2002, Proceedings in Computational Statistics, 15th Symposium, Berlin, Germany, August 24-28, 2002. Springer, 2002 pp. 123-128. doi:10.1007/978-3-642-57489-4\13. URL https://doi.org/10.1007/978-3-642-57489-4_13. · Zbl 1439.62034
[30] Roth V, Lange T. Feature Selection in Clustering Problems. In: Thrun S, Saul LK, Sch¨olkopf B (eds.), Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada]. MIT Press, 2003 pp. 473-480. URLhttp://papers.nips.cc/paper/2486-feature-selection-in-clustering-problems.
[31] Pascual D, Pla F, S´anchez JS. Cluster validation using information stability measures.Pattern Recognit. Lett., 2010.31(6):454-461. doi:10.1016/j.patrec.2009.07.009. URLhttps://doi.org/10.1016/j. patrec.2009.07.009.
[32] Law MHC, Topchy AP, Jain AK. Multiobjective Data Clustering. In: 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June - 2 July 2004, Washington, DC, USA. IEEE Computer Society, 2004 pp. 424-430. doi:10.1109/CVPR.2004.170. URLhttp://doi.ieeecomputersociety.org/10.1109/CVPR.2004.170.
[33] Guimera R, Amaral LAN.Functional cartography of complex metabolic networks.Nature, 2005. 433(7028):895-900. URLhttp://dx.doi.org/10.1038/nature03288.
[34] Clauset A, Newman M, Moore C. Finding community structure in very large networks.Physical review. E, Statistical, nonlinear, and soft matter physics, 2004.70 6 Pt 2:066111.
[35] Newman ME. Modularity and community structure in networks.Proc Natl Acad Sci U S A, 2006. 103(23):8577-8582. doi:10.1073/pnas.0601602103. URLhttp://www.ncbi.nlm.nih.gov/sites/ entrez?cmd=retrieve&db=pubmed&list_uids=16723398&dopt=AbstractPlus.
[36] Ahn YY, Bagrow J, Jørgensen S. Link communities reveal multiscale complexity in networks.Nature, 2010.466(7307):761-764. doi:10.1038/nature09182.
[37] Li S, Chen Y, Du H, Feldman MW. A genetic algorithm with local search strategy for improved detection of community structure.Complexity, 2010.15(4):53-60. doi:10.1002/cplx.20300. URLhttps://doi. org/10.1002/cplx.20300.
[38] Newman MEJ.Fast algorithm for detecting community structure in networks.Phys. Rev. E, 2004.69:066133. doi:10.1103/PhysRevE.69.066133. URLhttps://link.aps.org/doi/10.1103/ PhysRevE.69.066133.
[39] Mojarad M, Parvin H, Nejatian S, Rezaie V. Consensus Function Based on Clusters Clustering and Iterative Fusion of Base Clusters.Int. J. Uncertain. Fuzziness Knowl. Based Syst., 2019.27(1):97-120. doi:10.1142/S0218488519500053. URLhttps://doi.org/10.1142/S0218488519500053.
[40] Tan P, Steinbach MS, Kumar V. Introduction to Data Mining. Addison-Wesley, 2005. ISBN 0-321-321367. URLhttp://www-users.cs.umn.edu/%7Ekumar/dmbook/.
[41] Mojarad M, Nejatian S, Parvin H, Mohammadpoor M. A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters.Appl. Intell., 2019.49(7):2567-2581. doi:10.1007/ s10489-018-01397-x. URLhttps://doi.org/10.1007/s10489-018-01397-x.
[42] Iam-on N, Boongoen T, Garrett SM, Price CJ. A Link-Based Approach to the Cluster Ensemble Problem. IEEE Trans. Pattern Anal. Mach. Intell., 2011.33(12):2396-2409. doi:10.1109/TPAMI.2011.84. URL https://doi.org/10.1109/TPAMI.2011.84.
[43] Zhou Z, Tang W. Clusterer ensemble.Knowl. Based Syst., 2006.19(1):77-83. doi:10.1016/j.knosys.2005. 11.003. URLhttps://doi.org/10.1016/j.knosys.2005.11.003.
[44] Topchy AP, Jain AK, Punch WF. Clustering Ensembles: Models of Consensus and Weak Partitions. IEEE Trans. Pattern Anal. Mach. Intell., 2005.27(12):1866-1881. doi:10.1109/TPAMI.2005.237. URL https://doi.org/10.1109/TPAMI.2005.237.
[45] Nguyen N, Caruana R. Consensus Clusterings. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), October 28-31, 2007, Omaha, Nebraska, USA. IEEE Computer Society, 2007 pp. 607-612. doi:10.1109/ICDM.2007.73. URLhttps://doi.org/10.1109/ICDM.2007.73.
[46] Minaei-Bidgoli B, Parvin H, Alinejad-Rokny H, Alizadeh H, Punch WF. Effects of resampling method and adaptation on clustering ensemble efficacy.Artif. Intell. Rev., 2014.41(1):27-48. doi:10.1007/ s10462-011-9295-x. URLhttps://doi.org/10.1007/s10462-011-9295-x.
[47] Dietterich TG.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.