×

Detecting overlapping communities in networks using spectral methods. (English) Zbl 1484.62073

Summary: Community detection has been well studied in network analysis, but the more realistic case of overlapping communities remains a challenge. Here we propose a general, flexible, and interpretable generative model for overlapping communities, which can be viewed as generalizing several previous models in different ways. We develop an efficient spectral algorithm for estimating the community memberships, which deals with the overlaps by employing the \(K\)-medians algorithm rather than the usual \(K\)-means for clustering in the spectral domain. We show that the algorithm is asymptotically consistent when the network is not too sparse and the overlaps between communities are not too large. Numerical experiments on both simulated networks and many real social networks demonstrate that our method performs well compared to a number of benchmark methods for overlapping community detection.

MSC:

62H22 Probabilistic graphical models
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F12 Asymptotic properties of parametric estimators
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, Mixed membership stochastic blockmodels, J. Mach. Learn. Res., 9 (2008), pp. 1981-2014. · Zbl 1225.68143
[2] A. A. Amini, A. Chen, P. J. Bickel, and E. Levina, Pseudo-likelihood methods for community detection in large sparse networks, Ann. Statist., 41 (2013), pp. 2097-2122. · Zbl 1277.62166
[3] A. Anandkumar, R. Ge, D. Hsu, and S. M. Kakade, A tensor approach to learning mixed membership community models, J. Machine Learn. Res., 15 (2014), pp. 2239-2312. · Zbl 1318.68136
[4] B. Ball, B. Karrer, and M. E. J. Newman, An efficient and principled method for detecting communities in networks, Phys. Rev. E, 34 (2011), 036103.
[5] P. J. Bickel and A. Chen, A nonparametric view of network models and Newman-Girvan and other modularities, Proc. Natl. Acad. Sci. USA, 106 (2009), pp. 21068-21073. · Zbl 1359.62411
[6] P. J. Bickel and P. Sarkar, Hypothesis testing for automated community detection in networks, J. R. Stat. Soc. Ser. B Stat. Methodol., 78 (2016), pp. 253-273. · Zbl 1411.62162
[7] H. Bolouri and E. H. Davidson, The gene regulatory network basis of the “community effect,” and analysis of a sea urchin embryo example, Developmental Biol., 340 (2010), pp. 170-178.
[8] T. T. Cai and X. Li, Robust and computationally feasible community detection in the presence of arbitrary outlier nodes, Ann. Statist., 43 (2015), pp. 1027-1059. · Zbl 1328.62381
[9] B. L. Chamberlain, Graph partitioning algorithms for distributing workloads of parallel computations, University of Washington Technical Report UW-CSE-98-10, 3, Seattle, WA, 1998.
[10] K. Chaudhuri, F. C. Graham, and A. Tsiatas, Spectral clustering of graphs with general degrees in the extended planted partition model, in Proceedings of the 25th Annual Conference on Learning Theory (COLT), Edinburgh, Scotland, 2012, 35.
[11] K. Chen and J. Lei, Network cross-validation for determining the number of communities in network data, J. Amer. Statist. Assoc., 113 (2018), pp. 241-251. · Zbl 1398.62159
[12] Y. Chen, X. Li, and J. Xu, Convexified modularity maximization for degree-corrected stochastic block models, Ann. Statist., 46 (2018), pp. 1573-1602. · Zbl 1410.62105
[13] D. S. Choi, P. J. Wolfe, and E. M. Airoldi, Stochastic blockmodels with a growing number of classes, Biometrika, 99 (2012), pp. 273-284. · Zbl 1318.62207
[14] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications, Phys. Rev. E, 84 (2011), 066106.
[15] S. Fortunato, Community detection in graphs, Phys. Rep., 486 (2010), pp. 75-174.
[16] C. Gao, Z. Ma, A. Y. Zhang, and H. H. Zhou, Community detection in degree-corrected block models, Ann. Statist., 46 (2018), pp. 2153-2185. · Zbl 1408.62116
[17] N. Gillis and S. Vavasis, Fast and robust recursive algorithms for separable nonnegative matrix factorization, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014), pp. 698-714. · Zbl 1316.15015
[18] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi, A survey of statistical network models, Found. Trends Mach. Learn., 2 (2010), pp. 129-233. · Zbl 1184.68030
[19] S. Gregory, Finding overlapping communities in networks by label propagation, New J. Phys., 12 (2010), 103018. · Zbl 1448.90094
[20] B. Hendrickson and T. G. Kolda, Graph partitioning models for parallel computing, Parallel Comput., 26 (2000), pp. 1519-1534. · Zbl 0948.68130
[21] P. W. Holland, K. B. Laskey, and S. Leinhardt, Stochastic blockmodels: First steps, Social Networks, 5 (1983), pp. 109-137.
[22] P. W. Holland and S. Leinhardt, An exponential family of probability distributions for directed graphs, J. Amer. Statist. Assoc., 76 (1981), pp. 33-50. · Zbl 0457.62090
[23] A. Hollocou, T. Bonald, and M. Lelarge, Modularity-based sparse soft graph clustering, in AISTATS 2019, Okinawa, Japan, 2019.
[24] S. B. Hopkins and D. Steurer, Efficient Bayesian estimation from few samples: Community detection and related problems, in Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science (FOCS), IEEE, Washington, DC, 2017, pp. 379-390.
[25] J. Jin, Fast community detection by score, Ann. Statist., 43 (2015), pp. 57-89. · Zbl 1310.62076
[26] J. Jin and Z. T. Ke, A Sharp Lower Bound for Mixed-Membership Estimation, preprint, https://arxiv.org/abs/1709.05603, 2017.
[27] J. Jin, Z. T. Ke, and S. Luo, Estimating Network Memberships by Simplex Vertex Hunting, preprint, https://arxiv.org/abs/1708.07852, 2017.
[28] A. Joseph and B. Yu, Impact of regularization on spectral clustering, Ann. Statist., 44 (2016), pp. 1765-1791. · Zbl 1357.62229
[29] B. Karrer and M. E. Newman, Stochastic blockmodels and community structure in networks, Phys. Rev. E, 83 (2011), 016107.
[30] E. Kaufmann, T. Bonald, and M. Lelarge, A spectral algorithm with additive clustering for the recovery of overlapping communities in networks, in Proceedings of the International Conference on Algorithmic Learning Theory, Springer, New York, 2016, pp. 355-370. · Zbl 1398.68441
[31] A. Lancichinetti, S. Fortunato, and J. Kertész, Detecting the overlapping and hierarchical community structure in complex networks, New J. Phys., 11 (2009), 033015.
[32] A. Lancichinetti, F. Radicchi, J. J. Ramasco, and S. Fortunato, Finding statistically significant communities in networks, PloS One, 6 (2011), e18961.
[33] P. Latouche, E. Birmelé, and C. Ambroise, Overlapping Stochastic Block Models, preprint, https://arxiv.org/abs/0910.2098v1, 2009. · Zbl 1349.62276
[34] C. M. Le and E. Levina, Estimating the Number of Communities in Networks by Spectral Methods, preprint, https://arxiv.org/abs/1507.00827, 2015.
[35] C. M. Le, E. Levina, and R. Vershynin, Optimization via low-rank approximation for community detection in networks, Ann. Statist., 44 (2016), pp. 373-400. · Zbl 1331.62312
[36] J. Leskovec and J. J. Mcauley, Learning to discover social circles in ego networks, in Advances in Neural Information Processing Systems, Lake Tahoe, NV, 2012, pp. 539-547.
[37] T. Li, E. Levina, and J. Zhu, Network Cross-Validation by Edge Sampling, preprint, https://arxiv.org/abs/1612.04717, 2016. · Zbl 1441.62049
[38] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations, Behavioral Ecol. Sociobiol., 54 (2003), pp. 396-405.
[39] X. Mao, P. Sarkar, and D. Chakrabarti, Estimating Mixed Memberships with Sharp Eigenvector Deviations, preprint, https://arxiv.org/abs/1709.00407, 2017.
[40] X. Mao, P. Sarkar, and D. Chakrabarti, On mixed memberships and symmetric nonnegative matrix factorizations, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, JMLR, Sydney, Australia, 2017, pp. 2324-2333.
[41] X. Mao, P. Sarkar, and D. Chakrabarti, Overlapping clustering models, and one (class) SVM to bind them all, in Advances in Neural Information Processing Systems, Montreal, Canada, 2018, pp. 2126-2136.
[42] E. Mossel, J. Neeman, and A. Sly, Consistency Thresholds for Binary Symmetric Block Models, preprint, https://arxiv.org/abs/1407.1591v1, 2014. · Zbl 1321.05242
[43] M. E. Newman and M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E, 69 (2004), 026113.
[44] M. E. J. Newman, Spectral methods for network community detection and graph partitioning, Phys. Rev. E, 88 (2013), 042822.
[45] T. L. J. Ng and T. B. Murphy, Generalized random dot product graph, Statist. Probab. Lett., 148 (2019), pp. 143-149, https://doi.org/10.1016/j.spl.2019.01.011. · Zbl 1442.60017
[46] C. L. M. Nickel, Random Dot Product Graphs: A Model for Social Networks, Ph.D. thesis, Johns Hopkins University, Baltimore, MD, 2007.
[47] G. Palla, I. Derényi, I. Farkas, and T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature, 435 (2005), pp. 814-818.
[48] C. Pizzuti, Overlapped community detection in complex networks, in Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, ACM, New York, 2009, pp. 859-866.
[49] I. Psorakis, S. Roberts, M. Ebden, and B. Sheldon, Overlapping community detection using Bayesian non-negative matrix factorization, Phys. Rev. E, 83 (2011), 066114.
[50] T. Qin and K. Rohe, Regularized spectral clustering under the degree-corrected stochastic blockmodel, in Advances in Neural Information Processing Systems, Lake Tahoe, NV, 2013, pp. 3120-3128.
[51] M. D. Resnick, P. S. Bearman, R. W. Blum, K. E. Bauman, K. M. Harris, J. Jones, J. Tabor, T. Beuhring, R. E. Sieving, and M. Shew, Protecting adolescents from harm: Findings from the National Longitudinal Study on Adolescent Health, JAMA, 278 (1997), pp. 823-832.
[52] K. Rohe, S. Chatterjee, and B. Yu, Spectral clustering and the high-dimensional stochastic blockmodel, Ann. Statist., 39 (2011), pp. 1878-1915. · Zbl 1227.62042
[53] P. Rubin-Delanchy, C. E. Priebe, and M. Tang, Consistency of Adjacency Spectral Embedding for the Mixed Membership Stochastic Blockmodel, preprint, https://arxiv.org/abs/1705.04518, 2017.
[54] D. F. Saldana, Y. Yu, and Y. Feng, How Many Communities Are There?, preprint, https://arxiv.org/abs/1412.1684, 2014.
[55] P. Sarkar and P. J. Bickel, Role of Normalization in Spectral Clustering for Stochastic Blockmodels, preprint, https://arxiv.org/abs/1310.1495, 2013. · Zbl 1320.62150
[56] M. Tang and C. E. Priebe, Limit theorems for eigenvectors of the normalized Laplacian for random graphs, Ann. Statist., 46 (2018), pp. 2360-2415. · Zbl 1408.62120
[57] U. Von Luxburg, A tutorial on spectral clustering, Stat. Comput., 17 (2007), pp. 395-416.
[58] F. Wang, T. Li, X. Wang, S. Zhu, and C. Ding, Community discovery using nonnegative matrix factorization, Data Min. Knowl. Discov., 22 (2011), pp. 493-521. · Zbl 1235.68034
[59] X. Wen, W.-N. Chen, Y. Lin, T. Gu, H. Zhang, Y. Li, Y. Yin, and J. Zhang, A maximal clique based multiobjective evolutionary algorithm for overlapping community detection, IEEE Trans. Evolution. Comput., 21 (2016), pp. 363-377.
[60] J. J. Whang, D. F. Gleich, and I. S. Dhillon, Overlapping community detection using neighborhood-inflated seed expansion, IEEE Trans. Knowl. Data Engrg., 28 (2016), pp. 1272-1284.
[61] J. Xie, S. Kelley, and B. K. Szymanski, Overlapping community detection in networks: The state-of-the-art and comparative study, ACM Comput. Surveys, 45 (2013), 43. · Zbl 1288.68191
[62] S. J. Young and E. R. Scheinerman, Random dot product graph models for social networks, in Algorithms and Models for the Web-Graph, Springer, New York, 2007, pp. 138-149. · Zbl 1136.05322
[63] W. W. Zachary, An information flow model for conflict and fission in small groups, J. Anthropolog. Res., 33 (1977), pp. 452-473.
[64] A. Zhang, Protein Interaction Networks: Computational Analysis, Cambridge University Press, Cambridge, UK, 2009. · Zbl 1320.92021
[65] A. Y. Zhang and H. H. Zhou, Minimax rates of community detection in stochastic block models, Ann. Statist., 44 (2016), pp. 2252-2280. · Zbl 1355.60125
[66] Y. Zhao, E. Levina, and J. Zhu, Community extraction for social networks, Proc. Natl. Acad. Sci. USA, 108 (2011), pp. 7321-7326.
[67] Y. Zhao, E. Levina, and J. Zhu, Consistency of community detection in networks under degree-corrected stochastic block models, Ann. Statist., 40 (2012), pp. 2266-2292. · Zbl 1257.62095
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.