×

Find your place: simple distributed algorithms for community detection. (English) Zbl 1451.68344

Summary: Given an underlying graph, we consider the following dynamics: Initially, each node locally chooses a value in \(\{-1,1\} \), uniformly at random and independently of other nodes. Then, in each consecutive round, every node updates its local value to the average of the values held by its neighbors, at the same time applying an elementary, local clustering rule that only depends on the current and the previous values held by the node. We prove that the process resulting from this dynamics produces a clustering that exactly or approximately (depending on the graph) reflects the underlying cut in logarithmic time, under various graph models that exhibit a sparse balanced cut, including the stochastic block model. We also prove that a natural extension of this dynamics performs community detection on a regularized version of the stochastic block model with multiple communities. Rather surprisingly, our results provide rigorous evidence for the ability of an extremely simple and natural dynamics to perform community detection, a computational problem which is nontrivial even in a centralized setting.

MSC:

68W15 Distributed algorithms
68R10 Graph theory (including graph drawing) in computer science
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] E. Abbe, Community detection and stochastic block models: Recent developments, J. Mach. Learn. Res., 18 (2017), 177, http://jmlr.org/papers/v18/16-480.html. · Zbl 1403.62110
[2] E. Abbe, A. S. Bandeira, and G. Hall, Exact recovery in the stochastic block model, IEEE Trans. Inform. Theory, 62 (2014), pp. 471-487. · Zbl 1359.94047
[3] E. Abbe and C. Sandon, Detection in the Stochastic Block Model with Multiple Clusters: Proof of the Achievability Conjectures, Acyclic BP, and the Information-Computation Gap, preprint, https://arxiv.org/abs/1512.09080, 2015.
[4] Y. Afek, N. Alon, O. Barad, E. Hornstein, N. Barkai, and Z. Bar-Joseph, A biological solution to a fundamental distributed computing problem, Science, 331 (2011), pp. 183-185. · Zbl 1226.92001
[5] A. Amit and N. Linial, Random graph coverings I: General theory and graph connectivity, Combinatorica, 22 (2002), pp. 1-18. · Zbl 0996.05105
[6] D. Angluin, J. Aspnes, and D. Eisenstat, A simple population protocol for fast robust approximate majority, Distributed Comput., 21 (2008), pp. 87-102. · Zbl 1267.68055
[7] T. C. Aysal, M. E. Yildiz, A. D. Sarwate, and A. Scaglione, Broadcast gossip algorithms for consensus, IEEE Trans. Signal Process., 57 (2009), pp. 2748-2761. · Zbl 1392.68094
[8] M. J. Barber and J. W. Clark, Detecting network communities by propagating labels under constraints, Phys. Rev. E, 80 (2009), 026129.
[9] L. Becchetti, A. E. Clementi, P. Manurangsi, E. Natale, F. Pasquale, P. Raghavendra, and L. Trevisan, Average whenever you meet: Opportunistic protocols for community detection, in Proceedings of the 26th Annual European Symposium on Algorithms, ESA 2018, 2018, 7, https://doi.org/10.4230/LIPIcs.ESA.2018.7. · Zbl 1522.68224
[10] L. Becchetti, A. E. Clementi, E. Natale, F. Pasquale, and L. Trevisan, Find your place: Simple distributed algorithms for community detection, in Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, SIAM, Philadelphia, 2017, pp. 940-959, https://doi.org/10.1137/1.9781611974782.59. · Zbl 1410.68278
[11] L. Becchetti, E. Cruciani, F. Pasquale, and S. Rizzo, Step-by-step community detection in volume-regular graphs, in Proceedings of the 30th International Symposium on Algorithms and Computation, ISAAC 2019, 2019, 20. · Zbl 07650253
[12] F. Bénézit, P. Thiran, and M. Vetterli, Interval consensus: From quantized gossip to voting, in Proceedings of the 34th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, IEEE, Washington, DC, 2009, pp. 3661-3664.
[13] A. C. Berry, The accuracy of the Gaussian approximation to the sum of independent variates, Trans. Amer. Math. Soc., 49 (1941), pp. 122-136. · Zbl 0025.34603
[14] R. B. Boppana, Eigenvalues and graph bisection: An average-case analysis, in Proceedings of the 28th Annual IEEE Symposium on Foundations of Computer Science, FOCS 1987, IEEE, Washington, DC, 1987, pp. 280-285.
[15] C. Bordenave, Personal communication, 2015.
[16] C. Bordenave, A New Proof of Friedman’s Second Eigenvalue Theorem and Its Extension to Random Lifts, preprint, https://arxiv.org/abs/1502.04482, 2015.
[17] C. Bordenave, M. Lelarge, and L. Massoulié, Non-backtracking spectrum of random graphs: Community detection and non-regular Ramanujan graphs, in Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2015, IEEE, Washington, DC, 2015, pp. 1347-1357.
[18] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, Randomized gossip algorithms, IEEE/ACM Trans. Networking, 14 (2006), pp. 2508-2530, https://doi.org/10.1109/TIT.2006.874516 (accessed 2015-10-18). · Zbl 1283.94005
[19] G. Brito, I. Dumitriu, S. Ganguly, C. Hoffman, and L. V. Tran, Recovery and rigidity in a regular stochastic block model, in Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, SIAM, Philadelphia, 2015, pp. 1589-1601, https://doi.org/10.1137/1.9781611974331.ch108. · Zbl 1410.68283
[20] L. Cardelli and A. Csikász-Nagy, The cell cycle switch computes approximate majority, Sci. Rep., 2 (2012), 656.
[21] J. Cheeger, A lower bound for the smallest eigenvalue of the Laplacian, in Proceedings of the Princeton Conference in Honor of Professor S. Bochner, Princeton University Press, Princeton, NJ, 1969, pp. 195-199. · Zbl 0212.44903
[22] F. R. Chung, Laplacians of graphs and Cheeger’s inequalities, in Combinatorics, Paul Erdös is Eighty, Vol. 2, János Bolyai Math. Soc., Budapest, 1996, pp. 157-172, http://www.ams.org/mathscinet-getitem?mr=1395858 (accessed 2015-09-26). · Zbl 0864.05070
[23] A. Coja-Oghlan, Spectral Techniques, Semidefinite Programs, and Random Graphs, Habilitation thesis, Humboldt University, Berlin, 2005, http://www.math.uni-frankfurt.de/ acoghlan/habil.pdf. · Zbl 1297.05213
[24] A. Coja-Oghlan, Graph partitioning via adaptive spectral techniques, Combin. Probab. Comput., 19 (2010), pp. 227-284. · Zbl 1209.05178
[25] E. Cruciani, E. Natale, A. Nusser, and G. Scornavacca, Phase transition of the 2-choices dynamics on core-periphery networks, in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, 2018, pp. 777-785, http://dl.acm.org/citation.cfm?id=3237499.
[26] E. Cruciani, E. Natale, and G. Scornavacca, On the metastability of quadratic majority dynamics on clustered graphs and its biological implications, Bull. Eur. Assoc. Theor. Comput. Sci. EATCS, 125 (2018), pp. 185-190, http://eatcs.org/beatcs/index.php/beatcs/article/view/535.
[27] C. Davis and W. M. Kahan, The rotation of eigenvectors by a perturbation. III, SIAM J. Numer. Anal., 7 (1970), pp. 1-46, https://doi.org/10.1137/0707001. · Zbl 0198.47201
[28] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications, Phys. Rev. E, 84 (2011), 066106.
[29] M. H. DeGroot, Reaching a consensus, J. Amer. Statist. Assoc., 69 (1974), pp. 118-121. · Zbl 0282.92011
[30] D. Doty, Timing in chemical reaction networks, in Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, SIAM, Philadelphia, 2014, pp. 772-784, https://doi.org/10.1137/1.9781611973402.57. · Zbl 1421.68048
[31] M. E. Dyer and A. M. Frieze, The solution of some random NP-hard problems in polynomial expected time, J. Algorithms, 10 (1989), pp. 451-489. · Zbl 0689.68049
[32] D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Cambridge University Press, Cambridge, UK, 2010. · Zbl 1205.91007
[33] N. R. Franks, S. C. Pratt, E. B. Mallon, N. F. Britton, and D. J. Sumpter, Information flow, opinion polling and collective intelligence in house-hunting social insects, Phil. Trans. Roy. Soc. London B Biol. Sci., 357 (2002), pp. 1567-1583.
[34] N. E. Friedkin and E. C. Johnsen, Social influence and opinions, J. Math. Sociol., 15 (1990), pp. 193-206. · Zbl 0712.92025
[35] J. Friedman and D.-E. Kohler, The Relativized Second Eigenvalue Conjecture of Alon, preprint, https://arxiv.org/abs/1403.3462, 2014.
[36] Y. Hassin and D. Peleg, Distributed probabilistic polling and applications to proportionate agreement, Inform. Comput., 171 (2001), pp. 248-268. · Zbl 1005.68018
[37] P. W. Holland, K. B. Laskey, and S. Leinhardt, Stochastic blockmodels: First steps, Social Networks, 5 (1983), pp. 109-137.
[38] S. Hoory, N. Linial, and A. Wigderson, Expander graphs and their applications, Bull. Amer. Math. Soc., 43 (2006), pp. 439-561. · Zbl 1147.68608
[39] O. Gurel-Gurevich, Expectation of Square Root of Binomial R.V., MathOverflow, http://mathoverflow.net/q/121424 (version: 2013-02-10).
[40] M. Jerrum and A. Sinclair, Approximating the permanent, SIAM J. Comput., 18 (1989), pp. 1149-1178, https://doi.org/10.1137/0218077. · Zbl 0723.05107
[41] M. Jerrum and G. B. Sorkin, The Metropolis algorithm for graph bisection, Discrete Appl. Math., 82 (1998), pp. 155-175, https://doi.org/10.1016/S0166-218X(97)00133-9. · Zbl 0932.60079
[42] D. Kempe, A. Dobra, and J. Gehrke, Gossip-based computation of aggregate information, in Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2003, IEEE, Washington, DC, 2003, pp. 482-491.
[43] D. Kempe and F. McSherry, A decentralized algorithm for spectral analysis, in Proceedings of the 36th Annual ACM Symposium on Theory of Computing, STOC 2004, ACM, New York, 2004, pp. 561-568. · Zbl 1192.68848
[44] J. M. Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM, 46 (1999), pp. 604-632. · Zbl 1065.68660
[45] K. Kothapalli, S. V. Pemmaraju, and V. Sardeshmukh, On the analysis of a label propagation algorithm for community detection, in Proceedings of the 14th International Conference on Distributed Computing and Networking, ICDCN 2013, Mumbai, India, 2013, pp. 255-269. · Zbl 1351.68303
[46] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, and P. Zhang, Spectral redemption in clustering sparse networks, Proc. Natl. Acad. Sci. USA, 110 (2013), pp. 20935-20940. · Zbl 1359.62252
[47] C. Lanczos, An Iteration Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators, United States Government Press Office, Los Angeles, CA, 1950. · Zbl 0045.39702
[48] C. M. Le, E. Levina, and R. Vershynin, Concentration and regularization of random graphs, Random Struct. Algorithms, 51 (2017), pp. 538-561, https://doi.org/10.1002/rsa.20713. · Zbl 1373.05179
[49] N. Linial, Locality in distributed graph algorithms, SIAM J. Comput., 21 (1992), pp. 193-201, https://doi.org/10.1137/0221015. · Zbl 0787.05058
[50] X. Liu and T. Murata, Advanced modularity-specialized label propagation algorithm for detecting communities in networks, Phys. A Stat. Mech. Appl., 389 (2010), pp. 1493-1500.
[51] D. J. MacKay, Information Theory, Inference and Learning Algorithms, Cambridge University Press, Cambridge, UK, 2003. · Zbl 1055.94001
[52] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, Pregel: A system for large-scale graph processing, in Proceedings of the ACM SIGMOD International Conference on Management of Data SIGMOD 2010, ACM, New York, 2010, pp. 135-146, https://doi.org/10.1145/1807167.1807184.
[53] F. Mallmann-Trenn, C. Musco, and C. Musco, Eigenvector computation and community detection in asynchronous gossip models, in Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, Prague, Czech Republic, 2018, 159, https://doi.org/10.4230/LIPIcs.ICALP.2018.159. · Zbl 1499.68384
[54] L. Massoulie, Community detection thresholds and the weak Ramanujan property, in Proceedings of the 46th Annual ACM Symposium on Theory of Computing, STOC 2014, ACM, New York, 2014, pp. 694-703. · Zbl 1315.68210
[55] F. McSherry, Spectral partitioning of random graphs, in Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2001, IEEE, Washington, DC, 2001, pp. 529-537.
[56] Y. Métivier, J. M. Robson, N. Saheb-Djahromi, and A. Zemmari, An optimal bit complexity randomized distributed MIS algorithm, Distributed Comput., 23 (2011), pp. 331-340. · Zbl 1231.68277
[57] J. M. Mooij and H. J. Kappen, Sufficient conditions for convergence of the sum-product algorithm, IEEE Trans. Inform. Theory, 53 (2007), pp. 4422-4437. · Zbl 1314.94112
[58] E. Mossel, J. Neeman, and A. Sly, Reconstruction and estimation in the planted partition model, Probab. Theory Related Fields, 162 (2014), pp. 431-461, https://doi.org/10.1007/s00440-014-0576-6. · Zbl 1320.05113
[59] E. Mossel, J. Neeman, and A. Sly, Belief propagation, robust reconstruction and optimal recovery of block models, Ann. Appl. Probab., 26 (2016), pp. 2211-2256, https://doi.org/10.1214/15-AAP1145. · Zbl 1350.05154
[60] E. Mossel, J. Neeman, and A. Sly, A proof of the block model threshold conjecture, Combinatorica, 38 (2018), pp. 665-708. · Zbl 1424.05272
[61] E. Mossel, J. Neeman, and O. Tamuz, Majority dynamics and aggregation of information in social networks, Autonomous Agents Multi-Agent Syst., 28 (2014), pp. 408-429.
[62] E. Oja, Simplified neuron model as a principal component analyzer, J. Math. Biol., 15 (1982), pp. 267-273, https://doi.org/10.1007/BF00275687. · Zbl 0488.92012
[63] A. Olshevsky and J. N. Tsitsiklis, Convergence speed in distributed consensus and averaging, SIAM J. Control Optim., 48 (2009), pp. 33-55, https://doi.org/10.1137/060678324. · Zbl 1182.93008
[64] R. Peng, H. Sun, and L. Zanetti, Partitioning well-clustered graphs: Spectral clustering works!, SIAM J. Comput., 46 (2017), pp. 710-743, https://doi.org/10.1137/15M1047209. · Zbl 1370.05204
[65] U. N. Raghavan, R. Albert, and S. Kumara, Near linear time algorithm to detect community structures in large-scale networks, Phys. Rev. E, 76 (2007), 036106.
[66] S. Salihoglu, J. Shin, V. Khanna, B. Q. Truong, and J. Widom, Graft: A debugging tool for Apache Giraph, in Proceedings of the ACM SIGMOD International Conference on Management of Data SIGMOD 2015, ACM, New York, 2015, pp. 1403-1408, https://doi.org/10.1145/2723372.2735353.
[67] D. Shah, Gossip Algorithms, Now Publishers, Norwell, MA, 2009. · Zbl 1185.68072
[68] N. Shimizu and T. Shiraga, Phase transitions of best-of-two and best-of-three on stochastic block models, in Proceedings of the 33rd International Symposium on Distributed Computing, DISC 2019, Leibniz International Proceedings in Informatics (LIPIcs) 146, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Wadern, Germany, 2019, pp. 32:1-32:17, https://doi.org/10.4230/LIPIcs.DISC.2019.32. · Zbl 1515.68077
[69] P. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, New York, 1990. · Zbl 0706.65013
[70] H. Sun and L. Zanetti, Distributed Graph Clustering and Sparsification, preprint, https://arxiv.org/abs/1711.01262, 2017.
[71] J. Suomela, Survey of local algorithms, ACM Comput. Surveys, 45 (2013), 24, https://doi.org/10.1145/2431211.2431223. · Zbl 1293.68306
[72] J. P. Tian and D. Kannan, Lumpability and commutativity of Markov processes, Stochastic Anal. Appl., 24 (2006), pp. 685-702. · Zbl 1114.60059
[73] J. N. Tsitsiklis, Problems in Decentralized Decision Making and Computation, Tech. report, DTIC Document, 1984.
[74] Y. Weiss, Correctness of local probability propagation in graphical models with loops, Neural Comput., 12 (2000), pp. 1-41.
[75] L. Xiao, S. Boyd, and S.-J. Kim, Distributed average consensus with least-mean-square deviation, J. Parallel Distributed Comput., 67 (2007), pp. 33-46. · Zbl 1109.68019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.