×

Reconstruction and estimation in the planted partition model. (English) Zbl 1320.05113

Summary: The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on \(n\) nodes with two equal-sized clusters, with an between-class edge probability of \(q\) and a within-class edge probability of \(p\). Although most of the literature on this model has focused on the case of increasing degrees (ie. \(pn, qn \to \infty\) as \(n \to \infty\)), the sparse case \(p, q = O(1/n)\) is interesting both from a mathematical and an applied point of view. A striking conjecture of A. Decelle et al. [“Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications”, Phys. Rev. E 84, 066106 (2011)] based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if \(p=a/n\) and \(q=b/n\), then Decelle et al. [loc. cit.] conjectured that it is possible to cluster in a way correlated with the true partition if \((a-b)^2 > 2(a+b)\), and impossible if \((a-b)^2 < 2(a+b)\). By comparison, the best-known rigorous result is that of A. Coja-Oghlan [Comb. Probab. Comput. 19, No. 2, 227–284 (2010; Zbl 1209.05178)], who showed that clustering is possible if \((a - b)^2 > C (a + b)\) for some sufficiently large \(C\).
We prove half of their prediction, showing that it is indeed impossible to cluster if \((a - b)^2 < 2(a + b)\). Furthermore we show that it is impossible even to estimate the model parameters from the graph when \((a - b)^2 < 2(a + b)\); on the other hand, we provide a simple and efficient algorithm for estimating \(a\) and \(b\) when \((a - b)^2 > 2(a + b)\). Following Decelle et al. [loc. cit.], our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.

MSC:

05C80 Random graphs (graph-theoretic aspects)
60J85 Applications of branching processes
90B15 Stochastic network models in operations research
91D30 Social networks; opinion dynamics

Citations:

Zbl 1209.05178

Software:

STRUCTURE
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bickel, P.J., Chen, A.: A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. 106(50), 21068-21073 (2009) · Zbl 1359.62411 · doi:10.1073/pnas.0907096106
[2] Bleher, P.M., Ruiz, J., Zagrebnov, V.A.: On the purity of the limiting Gibbs state for the Ising model on the Bethe lattice. J. Stat. Phys. 79(1), 473-482 (1995) · Zbl 1081.82515 · doi:10.1007/BF02179399
[3] Bollobás, B.: Random Graphs, 2nd edn. Cambridge University Press, Cambridge (2001) · Zbl 0979.05003 · doi:10.1017/CBO9780511814068
[4] Bollobás, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous random graphs. Random Struct. Alg. 31(1), 3-122 (2007) · Zbl 1123.05083
[5] Boppana, R.B.: Eigenvalues and graph bisection: an average-case analysis. In: 28th Annual Symposium on Foundations of Computer Science, pp. 280-285. IEEE (1987) · Zbl 0338.05120
[6] Bui, T.N., Chaudhuri, S., Leighton, F.T., Sipser, M.: Graph bisection algorithms with good average case behavior. Combinatorica 7(2), 171-191 (1987) · doi:10.1007/BF02579448
[7] Coja-Oghlan, A.: Graph partitioning via adaptive spectral techniques. Combinat. Prob. Comput. 19(02), 227-284 (2010) · Zbl 1209.05178 · doi:10.1017/S0963548309990514
[8] Condon, A., Karp, R.M.: Algorithms for graph partitioning on the planted partition model. Random Struct. Alg. 18(2), 116-140 (2001) · Zbl 0972.68129 · doi:10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2
[9] Decelle, A., Krzakala, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (Dec 2011) · Zbl 1209.05225
[10] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological), 39(1), 1-38 (1977) · Zbl 0364.62022
[11] Dyer, M.E., Frieze, A.M.: The solution of some random NP-hard problems in polynomial expected time. J. Alg. 10(4), 451-489 (1989) · Zbl 0689.68049 · doi:10.1016/0196-6774(89)90001-1
[12] Evans, W., Kenyon, C., Peres, Y., Schulman, L.J.: Broadcasting on trees and the Ising model. Ann. Appl. Prob. 10(2), 410-433 (2000) · Zbl 1052.60076 · doi:10.1214/aoap/1019487349
[13] Friedman, J.: A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Am. Math. Soc. 195(910), viii+100 (2008) · Zbl 1177.05070
[14] Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete graph problems. Theor. Computer Sci. 1(3), 237-267 (1976) · Zbl 0338.05120 · doi:10.1016/0304-3975(76)90059-1
[15] Häggström, O., Mossel, E.: Nearest-neighbor walks with low predictability profile and percolation in \[2+\epsilon 2\]+ϵ dimensions. Ann. Prob. 26(3), 1212-1231 (1998) · Zbl 0937.60071
[16] Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76(4), 175-181 (2000) · Zbl 0996.68525 · doi:10.1016/S0020-0190(00)00142-3
[17] Hodges, J.L., Le Cam, L.: The Poisson approximation to the Poisson binomial distribution. Ann. Math. Stat. 31(3), 737-740 (1960) · Zbl 0100.14301 · doi:10.1214/aoms/1177705799
[18] Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109-137 (1983) · doi:10.1016/0378-8733(83)90021-7
[19] Janson, S.: Random regular graphs: asymptotic distributions and contiguity. Combinat. Prob. Comput. 4(04), 369-405 (1995) · Zbl 0846.05076 · doi:10.1017/S0963548300001735
[20] Janson, S.: Asymptotic equivalence and contiguity of some random graphs. Random Struct. Alg. 36(1), 26-45 (2010) · Zbl 1209.05225 · doi:10.1002/rsa.20297
[21] Jerrum, M., Sorkin, G.B.: The Metropolis algorithm for graph bisection. Discrete Appl. Math. 82(1-3), 155-175 (1998) · Zbl 0932.60079 · doi:10.1016/S0166-218X(97)00133-9
[22] Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241-254 (1967) · Zbl 1367.62191 · doi:10.1007/BF02289588
[23] Kesten, H., Stigum, B.P.: A limit theorem for multidimensional Galton-Watson processes. Ann. Math. Stat. 37(5), 1211-1223 (1966) · Zbl 0203.17401 · doi:10.1214/aoms/1177699266
[24] Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: Proceeding of the 17th International Conference on World Wide Web, pp. 695-704. ACM (2008)
[25] McSherry, F.: Spectral partitioning of random graphs. In: 42nd IEEE Symposium on Foundations of Computer Science, pp. 529-537. IEEE (2001) · Zbl 1209.05178
[26] Mossel, E.: Survey—information flow on trees. DIMACS Ser. Discrete Math. Theor. Computer Sci. 63, 155-170 (2004) · Zbl 1066.94006
[27] Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004) · Zbl 1032.91716
[28] Newman, M.E.J., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proc. Natl. Acad. Sci. USA 99(Suppl 1), 2566 (2002) · Zbl 1114.91362 · doi:10.1073/pnas.012582999
[29] Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945-959 (2000)
[30] Robinson, R.W., Wormald, N.C.: Almost all cubic graphs are Hamiltonian. Random Struct. Alg. 3(2), 117-125 (1992) · Zbl 0755.05075 · doi:10.1002/rsa.3240030202
[31] Robinson, R.W., Wormald, N.C.: Almost all regular graphs are Hamiltonian. Random Struct. Alg. 5(2), 363-374 (1994) · Zbl 0795.05088 · doi:10.1002/rsa.3240050209
[32] Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 39(4), 1878-1915 (2011) · Zbl 1227.62042 · doi:10.1214/11-AOS887
[33] Shi, J., Malik, J.: Normalized cuts and image segmentation. Pattern Anal. Mach. Intell. IEEE Trans. 22(8), 888-905 (2000) · doi:10.1109/34.868688
[34] Sly, A.: Reconstruction for the Potts model. Ann. Prob. 39(4), 1365-1406 (2011) · Zbl 1234.60095 · doi:10.1214/10-AOP584
[35] Snijders, T.A.B., Nowicki, K.: Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classif. 14(1), 75-100 (1997) · Zbl 0896.62063 · doi:10.1007/s003579900004
[36] Sonka, M., Hlavac, V., Boyle, R.: Image processing: analysis and machine vision, 4th edn. Cengage Learning, Stamford (2015)
[37] Strogatz, S.H.: Exploring complex networks. Nature 410(6825), 268-276 (2001) · Zbl 1370.90052 · doi:10.1038/35065725
[38] Wormald, N.C.: Models of random regular graphs. In: Lamb, J.D., Preece, D.A. (eds.) Surveys in Combinatorics 1999. London Mathematical Society Lecture Note Series, vol. 267. Cambridge University Press, Cambridge (1999) · Zbl 0935.05080
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.