The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 5, Number 1 (2011), 309-336.
Overlapping stochastic block models with application to the French political blogosphere
Pierre Latouche, Etienne Birmelé, and Christophe Ambroise
Full-text: Open access
Abstract
Complex systems in nature and in society are often represented as networks, describing the rich set of interactions between objects of interest. Many deterministic and probabilistic clustering methods have been developed to analyze such structures. Given a network, almost all of them partition the vertices into disjoint clusters, according to their connection profile. However, recent studies have shown that these techniques were too restrictive and that most of the existing networks contained overlapping clusters. To tackle this issue, we present in this paper the Overlapping Stochastic Block Model. Our approach allows the vertices to belong to multiple clusters, and, to some extent, generalizes the well-known Stochastic Block Model [Nowicki and Snijders (2001)]. We show that the model is generically identifiable within classes of equivalence and we propose an approximate inference procedure, based on global and local variational techniques. Using toy data sets as well as the French Political Blogosphere network and the transcriptional network of Saccharomyces cerevisiae, we compare our work with other approaches.
Article information
Source
Ann. Appl. Stat. Volume 5, Number 1 (2011), 309-336.
Dates
First available: 21 March 2011
Permanent link to this document
http://projecteuclid.org/euclid.aoas/1300715192
Digital Object Identifier
doi:10.1214/10-AOAS382
Zentralblatt MATH identifier
05906808
Mathematical Reviews number (MathSciNet)
MR2810399
Citation
Latouche, Pierre; Birmelé, Etienne; Ambroise, Christophe. Overlapping stochastic block models with application to the French political blogosphere. The Annals of Applied Statistics 5 (2011), no. 1, 309--336. doi:10.1214/10-AOAS382. http://projecteuclid.org/euclid.aoas/1300715192.
References
- Airoldi, E., Blei, D., Xing, E. and Fienberg, S. (2006). Mixed membership stochastic block models for relational data with application to protein–protein interactions. In Proceedings of the International Biometrics Society Annual Meeting. Montréal, Québec, Canada.
- Airoldi, E., Blei, D., Fienberg, S. and Xing, E. (2007). Mixed membership analysis of high-throughput interaction studies: Relational data. Available at ArXiv e-prints.
- Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
- Allman, E. S., Matias, C. and Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099–3132.Mathematical Reviews (MathSciNet): MR2549554
Zentralblatt MATH: 1191.62003
Digital Object Identifier: doi: 10.1214/09-AOS689
Project Euclid: euclid.aos/1250515381 - Bickel, P. and Chen, A. (2009). A non parametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
- Blei, D., Ng, A. and Jordan, M. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.
- Boer, P., Huisman, M., Snijders, T., Steglich, C., Wichers, L. and Zeggelink, E. (2006). StOCNET: An open software system for the advanced statistical analysis of social networks, Version 1.7.
- Broyden, C., Fletcher, R., Goldfarb, D. and Shanno, D. F. (1970). BFGS method. J. Inst. Math. Appl. 6 76–90.
- Byrd, R. H., Lu, P., Nocedal, J. and Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16 1190–1208.Mathematical Reviews (MathSciNet): MR1346301
Zentralblatt MATH: 0836.65080
Digital Object Identifier: doi: 10.1137/0916069 - Chang, J. (2010). The lda package Version, 1.2.
- Daudin, J.-J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. Statist. Comput. 18 173–183.Mathematical Reviews (MathSciNet): MR2390817
Digital Object Identifier: doi: 10.1007/s11222-007-9046-7 - Estrada, E. and Rodríguez-Velázquez, J. A. (2005). Spectral measures of bipartivity in complex networks. Phys. Rev. E (3) 72 046105.
- Everett, M. and Borgatti, S. (1998). Analyzing clique overlap. Connections 21 49–61.
- Fienberg, S. and Wasserman, S. (1981). Categorical data analysis of single sociometric relations. Soc. Methodol. 12 156–192.
- Frank, O. and Harary, F. (1982). Cluster inference by using transitivity indices in empirical graphs. J. Amer. Statist. Assoc. 77 835–840.Mathematical Reviews (MathSciNet): MR686407
Zentralblatt MATH: 0505.62043
Digital Object Identifier: doi: 10.2307/2287315
JSTOR: links.jstor.org - Fu, Q. and Banerjee, A. (2008). Multiplicative mixture models for overlapping clustering. In Proceedings of the IEEE International Conference on Data Mining 791–796. Pisa, Italy.
- Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826.Mathematical Reviews (MathSciNet): MR1908073
Zentralblatt MATH: 1032.91716
Digital Object Identifier: doi: 10.1073/pnas.122653799 - Goldenberg, A., Zheng, A., Fienberg, S. and Airoldi, E. (2010). A survey of statistical network models. Found. Trends Mach. Learn. 2 129–233.
- Griffiths, T. and Ghahramani, Z. (2005). Infinite latent feature models and the Indian buffet process. Adv. Neural Inform. Process. Syst. 18 475–482.
- Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301–354.Mathematical Reviews (MathSciNet): MR2364300
Digital Object Identifier: doi: 10.1111/j.1467-985X.2007.00471.x - Heller, K. and Ghahramani, Z. (2007). A nonparametric Bayesian approach to modeling overlapping clusters. In Proceedings of the 11th International Conference on AI and Statistics. San Juan, Puerto Rico.
- Heller, K., Williamson, S. and Ghahramani, Z. (2008). Statistical models for partial membership. In Proceedings of the 25th International Conference on Machine Learning 392–399. Helsinki, Finland.
- Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.Mathematical Reviews (MathSciNet): MR1951262
Zentralblatt MATH: 1041.62098
Digital Object Identifier: doi: 10.1198/016214502388618906
JSTOR: links.jstor.org - Hofman, J. and Wiggins, C. (2008). A Bayesian approach to network modularity. Phys. Rev. Lett. 100 258701.
- Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.Mathematical Reviews (MathSciNet): MR718088
Digital Object Identifier: doi: 10.1016/0378-8733(83)90021-7 - Jeffery, C. (1999). Moonlighting proteins. Trends Biochem. Sci. 24 8–11.
- Krivitsky, P. and Handcock, M. (2009). The latentnet package, Version 2.1-1.
- Krivitsky, P., Handcock, M., Raftery, A. and Hoff, P. (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social Networks 31 204–213.
- Lacroix, V., Fernandes, C. and Sagot, M.-F. (2006). Motif search in graphs: Application to metabolic networks. Trans. Comput. Biol. Bioinform. 3 360–368.
- Latouche, P., Birmelé, E. and Ambroise, C. (2009). Advances in Data Analysis, Data Handling, and Business Intelligence, Bayesian Methods for Graph Clustering 229–239. Springer, Berlin, Heidelberg.
- Latouche, P., Birmelé, E. and Ambroise, C. (2010). Supplement A to “Overlapping stochastic block models with application to the French blogosphere network.” DOI: 10.12.14/10-AOAS382SUPP.
- Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D. and Jacq, B. (2004). GOToolBox: Functional analysis of gene datasets based on Gene Ontology. Genome Biol. 5.
- Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, D., Chklovskii, D. and Alon, U. (2002). Network motifs: Simple building blocks of complex networks. Science 298 824–827.
- Moreno, J. (1934). Who Shall Survive?: A New Approach to the Problem of Human Interrelations. Nervous and Mental Disease Publishing, Washington, DC.
- Newman, M. and Leicht, E. (2007). Mixture models and exploratory analysis in networks. Proc. Natl. Acad. Sci. USA 104 9564–9569.
- Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077–1087.Mathematical Reviews (MathSciNet): MR1947255
Zentralblatt MATH: 1072.62542
Digital Object Identifier: doi: 10.1198/016214501753208735
JSTOR: links.jstor.org - Palla, G., Derenyi, I., Farkas, I. and Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature 435 814–818.
- Palla, G., Derenyi, I., Farkas, I. and Vicsek, T. (2006). CFinder the community/cluster finding program, Version 2.0.1.
- Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block sturcture. J. Classification 14 75–100.Mathematical Reviews (MathSciNet): MR1449742
Zentralblatt MATH: 0896.62063
Digital Object Identifier: doi: 10.1007/s003579900004 - White, H., Boorman, S. and Breiger, R. (1976). Social structure from multiple networks. I. Blockmodels of roles and positions. Amer. J. Soc. 81 730–780.
- Zanghi, H., Ambroise, C. and Miele, V. (2008). Fast online graph clustering via Erdös–Renyi mixture. Pattern Recognition 41 3592–3599.
Supplemental materials
- Supplementary material: Appendix. Describe how global and local variational techniques can be used to obtain a tractable lower bound. Introduce the optimization equations for the inference procedure.

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Model selection in overlapping stochastic block models
Latouche, Pierre, Birmelé, Etienne, and Ambroise, Christophe, Electronic Journal of Statistics, 2014 - Pseudo-likelihood methods for community detection in large sparse networks
Amini, Arash A., Chen, Aiyou, Bickel, Peter J., and Levina, Elizaveta, The Annals of Statistics, 2013 - Bayesian variable selection and data
integration for biological regulatory networks
Jensen, Shane T., Chen, Guang, and Stoeckert, Jr., Christian J., The Annals of Applied Statistics, 2007
- Model selection in overlapping stochastic block models
Latouche, Pierre, Birmelé, Etienne, and Ambroise, Christophe, Electronic Journal of Statistics, 2014 - Pseudo-likelihood methods for community detection in large sparse networks
Amini, Arash A., Chen, Aiyou, Bickel, Peter J., and Levina, Elizaveta, The Annals of Statistics, 2013 - Bayesian variable selection and data
integration for biological regulatory networks
Jensen, Shane T., Chen, Guang, and Stoeckert, Jr., Christian J., The Annals of Applied Statistics, 2007 - Bayesian Models and Gibbs Sampling Strategies for Local Graph Alignment and Motif Identification in Stochastic Biological Networks
Jiang, Rui, Chen, Ting, and Sun, Fengzhu, Communications in Information & Systems, 2009 - The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul
Jernite, Yacine, Latouche, Pierre, Bouveyron, Charles, Rivera, Patrick, Jegou, Laurent, and Lamassé, Stéphane, The Annals of Applied Statistics, 2014 - Spectral clustering and the high-dimensional stochastic blockmodel
Rohe, Karl, Chatterjee, Sourav, and Yu, Bin, The Annals of Statistics, 2011 - Community Structures in Classical Network Models
Li, Angsheng and Peng, Pan, Internet Mathematics, 2011 - Asymptotic blocking probabilities in loss networks with subexponential demands
Lu, Yingdong and Radovanović, Ana, Journal of Applied Probability, 2007 - Consistency of community detection in networks under degree-corrected stochastic block models
Zhao, Yunpeng, Levina, Elizaveta, and Zhu, Ji, The Annals of Statistics, 2012 - Product partition models with correlated parameters
Monteiro, Joao V. D., Assuncao, Renato M., and Loschi, Rosangela H., Bayesian Analysis, 2011