The Annals of Statistics
- Ann. Statist.
- Volume 45, Number 2 (2017), 500-528.
Likelihood-based model selection for stochastic block models
Y. X. Rachel Wang and Peter J. Bickel
Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Abstract
The stochastic block model (SBM) provides a popular framework for modeling community structures in networks. However, more attention has been devoted to problems concerning estimating the latent node labels and the model parameters than the issue of choosing the number of blocks. We consider an approach based on the log likelihood ratio statistic and analyze its asymptotic properties under model misspecification. We show the limiting distribution of the statistic in the case of underfitting is normal and obtain its convergence rate in the case of overfitting. These conclusions remain valid when the average degree grows at a polylog rate. The results enable us to derive the correct order of the penalty term for model complexity and arrive at a likelihood-based model selection criterion that is asymptotically consistent. Our analysis can also be extended to a degree-corrected block model (DCSBM). In practice, the likelihood function can be estimated using more computationally efficient variational methods or consistent label estimation algorithms, allowing the criterion to be applied to large networks.
Article information
Source
Ann. Statist. Volume 45, Number 2 (2017), 500-528.
Dates
Received: October 2015
Revised: February 2016
First available in Project Euclid: 16 May 2017
Permanent link to this document
http://projecteuclid.org/euclid.aos/1494921948
Digital Object Identifier
doi:10.1214/16-AOS1457
Subjects
Primary: 62F05: Asymptotic properties of tests
Keywords
Stochastic block models model misspecification network communities likelihood ratio statistic
Citation
Wang, Y. X. Rachel; Bickel, Peter J. Likelihood-based model selection for stochastic block models. Ann. Statist. 45 (2017), no. 2, 500--528. doi:10.1214/16-AOS1457. http://projecteuclid.org/euclid.aos/1494921948.
References
- Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery 36–43. ACM, New York.
- Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.Zentralblatt MATH: 1225.68143
- Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
- Ball, B., Karrer, B. and Newman, M. E. J. (2011). An efficient and principled method for detecting communities in networks. Phys. Rev. E (3) 84 036103.
- Bickel, P. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–73.
- Bickel, P. J. and Sarkar, P. (2016). Hypothesis testing for automated community detection in networks. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 253–273.
- Bickel, P., Choi, D., Chang, X. and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. 41 1922–1943.
- Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847–1899.
- Chen, K. and Lei, J. (2014). Network cross-validation for determining the number of communities in network data. Preprint. Available at arXiv:1411.1715.arXiv: arXiv:1411.1715
- Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika 99 273–284.
- Daudin, J.-J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. Stat. Comput. 18 173–183.
- Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E (3) 84 066106.
- Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23–39.
- Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
- Joseph, A. and Yu, B. (2013). Impact of regularization on spectral clustering. Preprint. Available at arXiv:1312.1733.arXiv: arXiv:1312.1733
- Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10.
- Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12 93–115.
- Le, C. M. and Levina, E. (2015). Estimating the number of communities in networks by spectral methods. Preprint. Available at arXiv:1507.00827.arXiv: arXiv:1507.00827
- Lei, J. (2014). A goodness-of-fit test for stochastic block models. Preprint. Available at arXiv:1412.4857.arXiv: arXiv:1412.4857
- Leskovec, J. and Mcauley, J. J. (2012). Learning to discover social circles in ego networks. In Adv. Neural Inf. Process. Syst. 539–547. Lake Tahoe.
- Newman, M. E. J. (2006a). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577–8582.
- Newman, M. E. J. (2006b). Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E (3) 74 036104, 19.
- Peixoto, T. P. (2013). Parsimonious module inference in large networks. Phys. Rev. Lett. 110 148701.
- Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.Mathematical Reviews (MathSciNet): MR2893856
Zentralblatt MATH: 1227.62042
Digital Object Identifier: doi:10.1214/11-AOS887
Project Euclid: euclid.aos/1314190618 - Saldana, D. F., Yu, Y. and Feng, Y. (2014). How many communities are there? Preprint. Available at arXiv:1412.1684.arXiv: arXiv:1412.1684
- Wang, Y. X. R. and Bickel, P. J. (2016). Supplement to “Likelihood-based model selection for stochastic block models.” DOI:10.1214/16-AOS1457SUPP.
- Wolfe, P. J. and Olhede, S. C. (2013). Nonparametric graphon estimation. Preprint. Available at arXiv:1309.5936.arXiv: arXiv:1309.5936
- Zhao, Y., Levina, E. and Zhu, J. (2011). Community extraction for social networks. Proc. Natl. Acad. Sci. USA 108 7321–7326.
- Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.Mathematical Reviews (MathSciNet): MR3059083
Zentralblatt MATH: 1257.62095
Digital Object Identifier: doi:10.1214/12-AOS1036
Project Euclid: euclid.aos/1358951382
Supplemental materials
- Supplement to “Likelihood-based model selection for stochastic block models”. A proof sketch of how the main results in the paper can be extended to the DCSBM described in Section 2.5 is provided in the supplement.Digital Object Identifier: doi:10.1214/16-AOS1457SUPPSupplemental files available for subscribers.

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Consistency of community detection in networks under degree-corrected stochastic block models
Zhao, Yunpeng, Levina, Elizaveta, and Zhu, Ji, The Annals of Statistics, 2012 - Bayesian degree-corrected stochastic blockmodels for community detection
Peng, Lijun and Carvalho, Luis, Electronic Journal of Statistics, 2016 - Further asymptotic properties of the generalized information criterion
Xu, ChangJiang and McLeod, A. Ian, Electronic Journal of Statistics, 2012
- Consistency of community detection in networks under degree-corrected stochastic block models
Zhao, Yunpeng, Levina, Elizaveta, and Zhu, Ji, The Annals of Statistics, 2012 - Bayesian degree-corrected stochastic blockmodels for community detection
Peng, Lijun and Carvalho, Luis, Electronic Journal of Statistics, 2016 - Further asymptotic properties of the generalized information criterion
Xu, ChangJiang and McLeod, A. Ian, Electronic Journal of Statistics, 2012 - Consistency of maximum-likelihood and variational estimators in the stochastic block model
Celisse, Alain, Daudin, Jean-Jacques, and Pierre, Laurent, Electronic Journal of Statistics, 2012 - Oracle inequalities for network models and sparse graphon estimation
Klopp, Olga, Tsybakov, Alexandre B., and Verzelen, Nicolas, The Annals of Statistics, 2017 - Mixed domain asymptotics for a stochastic process model with time trend and measurement error
Chang, Chih-Hao, Huang, Hsin-Cheng, and Ing, Ching-Kang, Bernoulli, 2017 - Impact of regularization on spectral clustering
Joseph, Antony and Yu, Bin, The Annals of Statistics, 2016 - Minimax rates of community detection in stochastic block models
Zhang, Anderson Y. and Zhou, Harrison H., The Annals of Statistics, 2016 - Classification and estimation in the Stochastic Blockmodel based on the empirical degrees
Channarond, Antoine, Daudin, Jean-Jacques, and Robin, Stéphane, Electronic Journal of Statistics, 2012 - Correction to the proof of consistency of community detection
Bickel, Peter J., Chen, Aiyou, Zhao, Yunpeng, Levina, Elizaveta, and Zhu, Ji, The Annals of Statistics, 2015