Electronic Journal of Statistics
- Electron. J. Statist.
- Volume 8, Number 1 (2014), 762-794.
Model selection in overlapping stochastic block models
Pierre Latouche, Etienne Birmelé, and Christophe Ambroise
Full-text: Open access
Abstract
Networks are a commonly used mathematical model to describe the rich set of interactions between objects of interest. Many clustering methods have been developed in order to partition such structures, among which several rely on underlying probabilistic models, typically mixture models. The relevant hidden structure may however show overlapping groups in several applications. The Overlapping Stochastic Block Model (Latouche, Birmelé and Ambroise (2011)) has been developed to take this phenomenon into account. Nevertheless, the problem of the choice of the number of classes in the inference step is still open. To tackle this issue, we consider the proposed model in a Bayesian framework and develop a new criterion based on a non asymptotic approximation of the marginal log-likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm, and demonstrate its efficiency by running it on both simulated and real data.
Article information
Source
Electron. J. Statist. Volume 8, Number 1 (2014), 762-794.
Dates
First available: 9 June 2014
Permanent link to this document
http://projecteuclid.org/euclid.ejs/1402320466
Digital Object Identifier
doi:10.1214/14-EJS903
Citation
Latouche, Pierre; Birmelé, Etienne; Ambroise, Christophe. Model selection in overlapping stochastic block models. Electronic Journal of Statistics 8 (2014), no. 1, 762--794. doi:10.1214/14-EJS903. http://projecteuclid.org/euclid.ejs/1402320466.
References
- Airoldi, E., Blei, D., Fienberg, S., and Xing, E., Mixed membership analysis of high-throughput interaction studies: relational data., ArXiv e-prints, 2007.
- Airoldi, E., Blei, D., Xing, E., and Fienberg, S., Mixed membership stochastic block models for relational data with application to protein-protein interactions. In, Proceedings of the International Biometrics Society Annual Meeting, 2006.
- Airoldi, E.M., Blei, D.M., Fienberg, S.E., and Xing, E.P., Mixed membership stochastic blockmodels., Journal of Machine Learning Research, 9 :1981–2014, 2008.
- Albert, R. and Barabási, A.L., Statistical mechanics of complex networks., Modern Physics, 74:47–97, 2002.Mathematical Reviews (MathSciNet): MR1895096
Digital Object Identifier: doi: 10.1103/RevModPhys.74.47 - Ball, B., Karrer, B., and Newman, M.E.J., An efficient and principled method for detecting communities in networks., Phys. Rev. E, 84 (036103), 2011.
- Barabási, A.L. and Oltvai, Z.N., Network biology: understanding the cell’s functional organization., Nature Rev. Genet., 5:101–113, 2004.
- Beal, M.J. and Ghahramani, Z., The variational Bayesian em algorithm for incomplete data: with application to scoring graphical model structures. In J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A.F.M. Smith, and M. West, editors, Bayesian Statistics 7: Proceedings of the 7th Valencia International Meeting, page 453, 2002.Mathematical Reviews (MathSciNet): MR2003189
- Bickel, P.J. and Chen, A., A non parametric view of network models and newman-girvan and other modularities. In, Proceedings of the National Academy of Sciences, volume 106, pages 21068–21073, 2009.
- Biernacki, C., Celeux, G., and Govaert, G., Exact and monte carlo calculations of integrated likelihoods for the latent class model., Journal of Statistical Planning and Inference, 140 :2991–3002, 2010.Mathematical Reviews (MathSciNet): MR2659830
Digital Object Identifier: doi: 10.1016/j.jspi.2010.03.042 - Bishop, C.M., Pattern Recognition and Machine Learning. Springer-Verlag, 2006.Mathematical Reviews (MathSciNet): MR2247587
- Bishop, C.M. and Svensén, M., Bayesian hierarchical mixtures of experts. In, Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, pages 57–64. U. Kjaerulff and C. Meek, 2003.
- Blei, D., Ng, A.Y., and Jordan, M.I., Latent dirichlet allocation., Journal of Machine Learning Research, 3:993 –1022, 2003.
- Boer, P., Huisman, M., Snijders, T.A.B., Steglich, C.E.G., Wichers, L.H.Y., and Zeggelink, E.P.H., StOCNET: An Open Software System for the Advanced Statistical Analysis of Social Networks, 2006.
- Daudin, J., Picard, F., and Robin, S., A mixture model for random graphs., Statistics and Computing, 18:1–36, 2008.Mathematical Reviews (MathSciNet): MR2390817
Digital Object Identifier: doi: 10.1007/s11222-007-9046-7 - Dempster, A.P., Laird, N.M., and Rubin, D.B., Maximum likelihood for incomplete data via the em algorithm., Journal of the Royal Statistical Society, B39:1–38, 1977.Mathematical Reviews (MathSciNet): MR501537
- Estrada, E. and Rodriguez-Velazquez, J.A., Spectral measures of bipartivity in complex networks., Physical Review E, 72 :046105, 2005.Mathematical Reviews (MathSciNet): MR2202758
Digital Object Identifier: doi: 10.1103/PhysRevE.72.046105 - Fienberg, S.E. and Wasserman, S., Categorical data analysis of single sociometric relations., Sociological Methodology, 12:156–192, 1981.
- Frank, O. and Harary, F., Cluster inference by using transitivity indices in empirical graphs., Journal of the American Statistical Association, 77:835–840, 1982.Mathematical Reviews (MathSciNet): MR686407
Digital Object Identifier: doi: 10.1080/01621459.1982.10477895 - Gazal, S., Daudin, J.-J., and Robin, S., Accuracy of variational estimates for random graph mixture models., Journal of Statistical Computation and Simulation, 2011.Mathematical Reviews (MathSciNet): MR2929296
Digital Object Identifier: doi: 10.1080/00949655.2011.560117 - Girvan, M. and Newman, M.E.J., Community structure in social and biological networks. In, Proceedings of the National Academy of Sciences, volume 99, pages 7821–7826, 2002.
- Griffiths, T. and Ghahramani, Z., Infinite latent feature models and the indian buffet process. In, Neural Information Processing Systems, volume 18, pages 475–482, 2005.
- Handcock, M.S., Raftery, A.E., and Tantrum, J.M., Model-based clustering for social networks., Journal of the Royal Statistical Society, 170:1–22, 2007.Mathematical Reviews (MathSciNet): MR2364300
Digital Object Identifier: doi: 10.1111/j.1467-985X.2007.00471.x - Heller, K. and Ghahramani, Z., A nonparametric Bayesian approach to modeling overlapping clusters. In, In Proceedings of The 11th International Conference On AI And Statistics, 2007.
- Heller, K., Williamson, S., and Ghahramani, Z., Statistical models for partial membership. In, Proceedings of the 25th International Conference on Machine Learning (ICML), pages 392–399, 2008.
- Hofman, J.M. and Wiggins, C.H., A Bayesian approach to network modularity., Physical Review Letters, 100 :258701, 2008.
- Holland, P., Laskey, K.B., and Leinhardt, S., Stochastic blockmodels: some first steps., Social Networks, 5:109–137, 1983.Mathematical Reviews (MathSciNet): MR718088
Digital Object Identifier: doi: 10.1016/0378-8733(83)90021-7 - Jaakkola, T.S. and Jordan, M.I., Bayesian parameter estimation via variational methods., Statistics and Computing, 10:25–37, 2000.
- Jeffery, C.J., Moonlighting proteins., Trends in Biochemical Sciences, 24:8–11, 1999.
- Krivitsky, P.N. and Handcock, M.S., The latentnet package, 2009.
- Latouche, P., Birmelé, E., and Ambroise, C., Bayesian methods for graph clustering, pages 229–239. Springer, 2009.
- Latouche, P., Birmelé, E., and Ambroise, C., Overlapping stochastic block models with application to the french political blogosphere., Annals of Applied Statistics, 5(1):309–336, 2011.Mathematical Reviews (MathSciNet): MR2810399
Digital Object Identifier: doi: 10.1214/10-AOAS382
Project Euclid: euclid.aoas/1300715192 - Latouche, P., Birmelé, E., and Ambroise, C., Variational Bayes inference and complexity control for stochastic block models., Statistical Modelling, 12(1):93–115, 2012.Mathematical Reviews (MathSciNet): MR2953099
Digital Object Identifier: doi: 10.1177/1471082X1001200105 - Mariadassou, M., Robin, S., and Vacher, C., Uncovering latent structure in valued graphs: a variational approach., Annals of Applied Statistics, 4(2), 2010.Mathematical Reviews (MathSciNet): MR2758646
Digital Object Identifier: doi: 10.1214/10-AOAS361
Project Euclid: euclid.aoas/1280842137 - McLachlan, G. and Krishnan, T., The EM algorithm and extensions. New York: John Wiley, 1997.Mathematical Reviews (MathSciNet): MR1417721
- Newman, M.E.J., Modularity and community structure in networks. In, aaa, volume 103, pages 8577–8582, 2006.
- Nowicki, K. and Snijders, T.A.B., Estimation and prediction for stochastic blockstructures., Journal of the American Statistical Association, 96 :1077–1087, 2001.Mathematical Reviews (MathSciNet): MR1947255
Digital Object Identifier: doi: 10.1198/016214501753208735 - Palla, G., Barabási, A.L., and Vicsek, T., Quantifying social group evolution., Nature, 446:664–667, 2007.
- Palla, G., Derenyi, I., Farkas, I., and Vicsek, T., Uncovering the overlapping community structure of complex networks in nature and society., Nature, 435:814–818, 2005.
- Palla, G., Derenyi, I., Farkas, I., and Vicsek, T., CFinder, the community cluster finding program, 2006.
- Snijders, T.A.B. and Nowicki, K., Estimation and prediction for stochastic block-structures for graphs with latent block structure., Journal of Classification, 14:75–100, 1997.
- Wang, Y.J. and Wong, G.Y., Stochastic blockmodels for directed graphs., Journal of the American Statistical Association, 82:8–19, 1987.Mathematical Reviews (MathSciNet): MR883333
Digital Object Identifier: doi: 10.1080/01621459.1987.10478385 - Yang, J. and Lescovec, J., Overlapping community detection at scale: a nonnegative matrix factorization approach. In, ACM International Conference on Web Search and Data Mining (WSDM), 2013.
- Zanghi, H., Ambroise, C., and Miele, V., Fast online graph clustering via erdös renyi mixture., Pattern Recognition, 41(12) :3592–3599, 2008.
The Institute of Mathematical Statistics and the Bernoulli Society

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Overlapping stochastic block models with application to the French political blogosphere
Latouche, Pierre, Birmelé, Etienne, and Ambroise, Christophe, The Annals of Applied Statistics, 2011 - Introduction: How to Deal with Uncertainty in Population Forecasting?
Lutz1, Wolfgang and Goldstein, Joshua R., International Statistical Review, 2004 - Cluster and Feature Modeling from Combinatorial Stochastic Processes
Broderick, Tamara, Jordan, Michael I., and Pitman, Jim, Statistical Science, 2013
- Overlapping stochastic block models with application to the French political blogosphere
Latouche, Pierre, Birmelé, Etienne, and Ambroise, Christophe, The Annals of Applied Statistics, 2011 - Introduction: How to Deal with Uncertainty in Population Forecasting?
Lutz1, Wolfgang and Goldstein, Joshua R., International Statistical Review, 2004 - Cluster and Feature Modeling from Combinatorial Stochastic Processes
Broderick, Tamara, Jordan, Michael I., and Pitman, Jim, Statistical Science, 2013 - The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul
Jernite, Yacine, Latouche, Pierre, Bouveyron, Charles, Rivera, Patrick, Jegou, Laurent, and Lamassé, Stéphane, The Annals of Applied Statistics, 2014 - Bayesian clustering of replicated time-course gene expression data with weak signals
Fu, Audrey Qiuyan, Russell, Steven, Bray, Sarah J., and Tavaré, Simon, The Annals of Applied Statistics, 2013 - A Gaussian Mixture Model to Detect Clusters Embedded in Feature Subspace
Li, Yuanhong, Dong, Ming, and Hua, Jing, Communications in Information & Systems, 2007 - Spatial modeling of the 3D morphology of hybrid polymer-ZnO
solar cells, based on electron tomography data
Stenzel, O., Schmidt, V., Hassfeld, H., Thiedmann, R., Koster, L. J. A., Oosterhout, S. D., van Bavel, S. S., Wienk, M. M., Loos, J., and Janssen, R. A. J., The Annals of Applied Statistics, 2011 - Bayesian Models and Gibbs Sampling Strategies for Local Graph Alignment and Motif Identification in Stochastic Biological Networks
Jiang, Rui, Chen, Ting, and Sun, Fengzhu, Communications in Information & Systems, 2009 - Consistency of community detection in networks under degree-corrected stochastic block models
Zhao, Yunpeng, Levina, Elizaveta, and Zhu, Ji, The Annals of Statistics, 2012 - Inferring sparse Gaussian graphical models with latent structure
Ambroise, Christophe, Chiquet, Julien, and Matias, Catherine, Electronic Journal of Statistics, 2009