×

Overlapping stochastic block models with application to the French political blogosphere. (English) Zbl 1220.62083

Summary: Complex systems in nature and in society are often represented as networks, describing the rich set of interactions between objects of interest. Many deterministic and probabilistic clustering methods have been developed to analyze such structures. Given a network, almost all of them partition the vertices into disjoint clusters, according to their connection profile. However, recent studies have shown that these techniques were too restrictive and that most of the existing networks contained overlapping clusters. To tackle this issue, we present in this paper the overlapping stochastic block model. Our approach allows the vertices to belong to multiple clusters, and, to some extent, generalizes the well-known stochastic block model [K. Nowicki and T. A. B. Snijders, J. Am. Stat. Assoc. 96, No. 455, 1077–1087 (2001; Zbl 1072.62542)]. We show that the model is generically identifiable within classes of equivalence and we propose an approximate inference procedure, based on global and local variational techniques. Using toy data sets as well as the French Political Blogosphere network and the transcriptional network of Saccharomyces cerevisiae, we compare our work with other approaches.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
90B18 Communication networks in operations research
05C80 Random graphs (graph-theoretic aspects)

Citations:

Zbl 1072.62542
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Airoldi, E., Blei, D., Xing, E. and Fienberg, S. (2006). Mixed membership stochastic block models for relational data with application to protein-protein interactions. In Proceedings of the International Biometrics Society Annual Meeting . Montréal, Québec, Canada.
[2] Airoldi, E., Blei, D., Fienberg, S. and Xing, E. (2007). Mixed membership analysis of high-throughput interaction studies: Relational data. Available at ArXiv e-prints.
[3] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981-2014. · Zbl 1225.68143
[4] Allman, E. S., Matias, C. and Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099-3132. · Zbl 1191.62003 · doi:10.1214/09-AOS689
[5] Bickel, P. and Chen, A. (2009). A non parametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068-21073. · Zbl 1359.62411
[6] Blei, D., Ng, A. and Jordan, M. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993-1022. · Zbl 1112.68379 · doi:10.1162/jmlr.2003.3.4-5.993
[7] Boer, P., Huisman, M., Snijders, T., Steglich, C., Wichers, L. and Zeggelink, E. (2006). StOCNET: An open software system for the advanced statistical analysis of social networks, Version 1.7.
[8] Broyden, C., Fletcher, R., Goldfarb, D. and Shanno, D. F. (1970). BFGS method. J. Inst. Math. Appl. 6 76-90.
[9] Byrd, R. H., Lu, P., Nocedal, J. and Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16 1190-1208. · Zbl 0836.65080 · doi:10.1137/0916069
[10] Chang, J. (2010). The lda package Version, 1.2.
[11] Daudin, J.-J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. Statist. Comput. 18 173-183. · doi:10.1007/s11222-007-9046-7
[12] Estrada, E. and Rodríguez-Velázquez, J. A. (2005). Spectral measures of bipartivity in complex networks. Phys. Rev. E (3) 72 046105.
[13] Everett, M. and Borgatti, S. (1998). Analyzing clique overlap. Connections 21 49-61.
[14] Fienberg, S. and Wasserman, S. (1981). Categorical data analysis of single sociometric relations. Soc. Methodol. 12 156-192.
[15] Frank, O. and Harary, F. (1982). Cluster inference by using transitivity indices in empirical graphs. J. Amer. Statist. Assoc. 77 835-840. JSTOR: · Zbl 0505.62043 · doi:10.2307/2287315
[16] Fu, Q. and Banerjee, A. (2008). Multiplicative mixture models for overlapping clustering. In Proceedings of the IEEE International Conference on Data Mining 791-796. Pisa, Italy.
[17] Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821-7826. · Zbl 1032.91716 · doi:10.1073/pnas.122653799
[18] Goldenberg, A., Zheng, A., Fienberg, S. and Airoldi, E. (2010). A survey of statistical network models. Found. Trends Mach. Learn. 2 129-233. · Zbl 1184.68030 · doi:10.1561/2200000005
[19] Griffiths, T. and Ghahramani, Z. (2005). Infinite latent feature models and the Indian buffet process. Adv. Neural Inform. Process. Syst. 18 475-482.
[20] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301-354. · Zbl 05273954 · doi:10.1111/j.1467-985X.2007.00471.x
[21] Heller, K. and Ghahramani, Z. (2007). A nonparametric Bayesian approach to modeling overlapping clusters. In Proceedings of the 11th International Conference on AI and Statistics . San Juan, Puerto Rico.
[22] Heller, K., Williamson, S. and Ghahramani, Z. (2008). Statistical models for partial membership. In Proceedings of the 25th International Conference on Machine Learning 392-399. Helsinki, Finland.
[23] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090-1098. JSTOR: · Zbl 1041.62098 · doi:10.1198/016214502388618906
[24] Hofman, J. and Wiggins, C. (2008). A Bayesian approach to network modularity. Phys. Rev. Lett. 100 258701.
[25] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109-137. · doi:10.1016/0378-8733(83)90021-7
[26] Jeffery, C. (1999). Moonlighting proteins. Trends Biochem. Sci. 24 8-11.
[27] Krivitsky, P. and Handcock, M. (2009). The latentnet package, Version 2.1-1.
[28] Krivitsky, P., Handcock, M., Raftery, A. and Hoff, P. (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social Networks 31 204-213.
[29] Lacroix, V., Fernandes, C. and Sagot, M.-F. (2006). Motif search in graphs: Application to metabolic networks. Trans. Comput. Biol. Bioinform. 3 360-368.
[30] Latouche, P., Birmelé, E. and Ambroise, C. (2009). Advances in Data Analysis, Data Handling, and Business Intelligence , Bayesian Methods for Graph Clustering 229-239. Springer, Berlin, Heidelberg.
[31] Latouche, P., Birmelé, E. and Ambroise, C. (2010). Supplement A to “Overlapping stochastic block models with application to the French blogosphere network.” DOI: . · Zbl 1220.62083
[32] Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D. and Jacq, B. (2004). GOToolBox: Functional analysis of gene datasets based on Gene Ontology. Genome Biol. 5 .
[33] Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, D., Chklovskii, D. and Alon, U. (2002). Network motifs: Simple building blocks of complex networks. Science 298 824-827.
[34] Moreno, J. (1934). Who Shall Survive?: A New Approach to the Problem of Human Interrelations . Nervous and Mental Disease Publishing, Washington, DC.
[35] Newman, M. and Leicht, E. (2007). Mixture models and exploratory analysis in networks. Proc. Natl. Acad. Sci. USA 104 9564-9569. · Zbl 1155.91026 · doi:10.1073/pnas.0610537104
[36] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077-1087. JSTOR: · Zbl 1072.62542 · doi:10.1198/016214501753208735
[37] Palla, G., Derenyi, I., Farkas, I. and Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature 435 814-818.
[38] Palla, G., Derenyi, I., Farkas, I. and Vicsek, T. (2006). CFinder the community/cluster finding program, Version 2.0.1.
[39] Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block sturcture. J. Classification 14 75-100. · Zbl 0896.62063 · doi:10.1007/s003579900004
[40] White, H., Boorman, S. and Breiger, R. (1976). Social structure from multiple networks. I. Blockmodels of roles and positions. Amer. J. Soc. 81 730-780.
[41] Zanghi, H., Ambroise, C. and Miele, V. (2008). Fast online graph clustering via Erdös-Renyi mixture. Pattern Recognition 41 3592-3599. · Zbl 1151.68623 · doi:10.1016/j.patcog.2008.06.019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.