Electronic Journal of Statistics
- Electron. J. Statist.
- Volume 10, Number 2 (2016), 3807-3870.
Consistent community detection in multi-relational data through restricted multi-layer stochastic blockmodel
Full-text: Open access
Abstract
In recent years there has been an increased interest in statistical analysis of data with multiple types of relations among a set of entities. Such multi-relational data can be represented as multi-layer graphs where the set of vertices represents the entities and multiple types of edges represent the different relations among them. For community detection in multi-layer graphs, we consider two random graph models, the multi-layer stochastic blockmodel (MLSBM) and a model with a restricted parameter space, the restricted multi-layer stochastic blockmodel (RMLSBM). We derive consistency results for community assignments from the maximum likelihood estimators (MLEs) in both models where MLSBM is assumed to be the true model, and either the number of nodes or the number of types of edges or both grow. We compare MLEs in the two models with other baseline approaches, such as separate modeling of layers, aggregating the layers and majority voting. In simulations RMLSBM is shown to have advantage over MLSBM when either the growth rate of the number of communities is high or the growth rate of the average degree of the component graphs in the multi-graph is low. We also derive minimax rates of error and thresholds for achieving consistency of community detection in both models, which are then used to compare the multi-layer models with a baseline model, the aggregate stochastic block model. The simulation studies and real data applications confirm the superior performance of the multi-layer approaches in comparison to the baseline procedures.
Article information
Source
Electron. J. Statist. Volume 10, Number 2 (2016), 3807-3870.
Dates
Received: March 2016
First available in Project Euclid: 6 December 2016
Permanent link to this document
http://projecteuclid.org/euclid.ejs/1480993455
Digital Object Identifier
doi:10.1214/16-EJS1211
Zentralblatt MATH identifier
06673462
Subjects
Primary: 62G20: Asymptotic properties
Keywords
Community detection consistency minimax rates multi-layer networks consistency thresholds stochastic blockmodel
Citation
Paul, Subhadeep; Chen, Yuguo. Consistent community detection in multi-relational data through restricted multi-layer stochastic blockmodel. Electron. J. Statist. 10 (2016), no. 2, 3807--3870. doi:10.1214/16-EJS1211. http://projecteuclid.org/euclid.ejs/1480993455.
References
- [1] Abbe, E. and Sandon, C. (2015). Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS) 670–688. IEEE.
- [2] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels., J. Mach. Learn. Res. 9 1981–2014.
- [3] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities., Proceedings of the National Academy of Sciences 106 21068–21073.
- [4] Bickel, P. J., Choi, D., Chang, X. and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels., Ann. Statist 41 1922–1943.Mathematical Reviews (MathSciNet): MR3127853
Zentralblatt MATH: 1292.62042
Digital Object Identifier: doi:10.1214/13-AOS1124
Project Euclid: euclid.aos/1382547508 - [5] Binkiewicz, N. M. (2015). Contextualized Network Analysis: Theory and Methods for Networks with Node Covariates PhD thesis, University of, Wisconsin-Madison.Mathematical Reviews (MathSciNet): MR3407490
- [6] Budimir, I., Dragomir, S. S. and Pecaric, J. (2001). Further reverse results for Jensen’s discrete inequality and applications in information theory., J. Inequal. Pure Appl. Math. 2 5.
- [7] Celisse, A., Daudin, J. J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model., Electronic Journal of Statistics 6 1847–1899.Mathematical Reviews (MathSciNet): MR2988467
Zentralblatt MATH: 1295.62028
Digital Object Identifier: doi:10.1214/12-EJS729
Project Euclid: euclid.ejs/1349355605 - [8] Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes., Biometrika 99 273–284.Mathematical Reviews (MathSciNet): MR2931253
Zentralblatt MATH: 1318.62207
Digital Object Identifier: doi:10.1093/biomet/asr053 - [9] Chung, F. and Lu, L. (2006)., Complex Graphs and Networks. American Mathematical Society.
- [10] Daudin, J. J., Picard, F. and Robin, S. (2008). A mixture model for random graphs., Stat Comput 18 173–183.Mathematical Reviews (MathSciNet): MR2390817
Digital Object Identifier: doi:10.1007/s11222-007-9046-7 - [11] Dong, X., Frossard, P., Vandergheynst, P. and Nefedov, N. (2012). Clustering with multi-layer graphs: A spectral perspective., IEEE Transactions on Signal Processing 60 5820–5831.Mathematical Reviews (MathSciNet): MR2990287
Digital Object Identifier: doi:10.1109/TSP.2012.2212886 - [12] Gao, C., Lu, Y. and Zhou, H. H. (2015). Rate-optimal graphon estimation., The Annals of Statistics 43 2624–2652.Mathematical Reviews (MathSciNet): MR3405606
Zentralblatt MATH: 1332.60050
Digital Object Identifier: doi:10.1214/15-AOS1354
Project Euclid: euclid.aos/1444222087 - [13] Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2015). Achieving Optimal Misclassification Proportion in Stochastic Block Model., arXiv preprint arXiv:1505.03772.
- [14] Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models., Foundations and Trends in Machine Learning 2 129–233.
- [15] Greene, D. and Cunningham, P. (2013). Producing a unified graph representation from multiple social network views., ACM Web Science 2 129–233.
- [16] Hajek, B., Wu, Y. and Xu, J. (2016). Achieving exact cluster recovery threshold via semidefinite programming., IEEE Transactions on Information Theory 62 2788–2797.Mathematical Reviews (MathSciNet): MR3493879
Digital Object Identifier: doi:10.1109/TIT.2016.2546280 - [17] Han, Q., Xu, K. and Airoldi, E. (2015). Consistent estimation of dynamic and multi-layer block models. In, Proceedings of The 32nd International Conference on Machine Learning 1511–1520.
- [18] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks., J. Roy. Statist. Soc. Ser. A 170 301–354.Mathematical Reviews (MathSciNet): MR2364300
Digital Object Identifier: doi:10.1111/j.1467-985X.2007.00471.x - [19] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis., J. Amer. Statist. Assoc. 97 1090–1098.Mathematical Reviews (MathSciNet): MR1951262
Zentralblatt MATH: 1041.62098
Digital Object Identifier: doi:10.1198/016214502388618906 - [20] Holland, P., Laskey, K. and Leinhardt, S. (1983). Stochastic blockmodels: some first steps., Social Networks 5 109–137.Mathematical Reviews (MathSciNet): MR718088
Digital Object Identifier: doi:10.1016/0378-8733(83)90021-7 - [21] Jenatton, R., Le Roux, N., Bordes, A. and Obozinski, G. (2012). A latent factor model for highly multi-relational data., Advances in Neural Information Processing Systems 3167–3175.
- [22] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks., Phys. Rev. E. 83 016107.Mathematical Reviews (MathSciNet): MR2788206
Digital Object Identifier: doi:10.1103/PhysRevE.83.016107 - [23] Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In, Proceedings of the 21st National Conference on Artificial Intelligence 1 381–388.
- [24] Latouche, P., Birmele, E. and Ambroise, C. (2011). Overlapping stochastic block models with application to the French political blogosphere., Ann. Appl. Stat. 5 309–336.Mathematical Reviews (MathSciNet): MR2810399
Zentralblatt MATH: 1220.62083
Digital Object Identifier: doi:10.1214/10-AOAS382
Project Euclid: euclid.aoas/1300715192 - [25] Leskovec, J., Lang, K. J., Dasgupta, A. and Mahoney, M. W. (2008). Statistical properties of community structure in large social and information networks. In, Proceedings of the 17th international conference on World Wide Web 695–704. ACM.
- [26] Matias, C. and Robin, S. (2014). Modeling heterogeneity in random graphs through latent space models: a selective review., ESAIM: Proceedings and Surveys 47 55–74.
- [27] Mossel, E., Neeman, J. and Sly, A. (2012). Stochastic block models and reconstruction., arXiv preprint arXiv:1202.1499.
- [28] Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture., arXiv preprint arXiv:1311.4115.
- [29] Mossel, E., Neeman, J. and Sly, A. (2014). Consistency thresholds for binary symmetric block models., arXiv preprint arXiv:1407.1591.
- [30] Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. and Onnela, J. P. (2010). Community structure in time-dependent, multiscale, and multiplex networks., Science 328 876–878.Mathematical Reviews (MathSciNet): MR2662590
Zentralblatt MATH: 1226.91056
Digital Object Identifier: doi:10.1126/science.1184819 - [31] Narayanan, M., Vetta, A., Schadt, E. and Zhu, J. (2010). Simultaneous clustering of multiple gene expression and physical interaction datasets., PLoS. Comp. Bio. 6 e1000742.Mathematical Reviews (MathSciNet): MR2652833
Digital Object Identifier: doi:10.1371/journal.pcbi.1000742 - [32] Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks., Phys. Rev. E 69 026113.
- [33] Nickel, M., Tresp, V. and Kriegel, H. P. (2011). A three-way model for collective learning on multi-relational data., International Conference on Machine Learning 809–816.
- [34] Nowicki, K. and Snijders, T. (2001). Estimation and prediction for stochastic block structures., J. Am. Stat. Assoc. 96 1077–1087.Mathematical Reviews (MathSciNet): MR1947255
Zentralblatt MATH: 1072.62542
Digital Object Identifier: doi:10.1198/016214501753208735 - [35] Papalexakis, E. E., Akoglu, L. and Ience, D. (2013). Do more views of a graph help? Community detection and clustering in multi-graphs. In, Proceedings of the 16th International Conference on Information Fusion 899–905.
- [36] Rocklin, M. and Pinar, A. (2011). Latent clustering on graphs with multiple edge types. In, Algorithms and Models for the Web Graph 38–49. Springer.Mathematical Reviews (MathSciNet): MR2842311
Zentralblatt MATH: 1327.68179
Digital Object Identifier: doi:10.1007/978-3-642-21286-4_4 - [37] Rohe, K., Qin, T. and Fan, H. (2012). The highest dimensional stochastic blockmodel with a regularized estimator., Statistica Sinica 39 1878–1915.
- [38] Simic, S. (2009). On an upper bound for Jensen’s inequality., Journal of Inequalities in Pure and Applied Mathematics 10 60.
- [39] Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure., Journal of Classificaion 14 75–100.Mathematical Reviews (MathSciNet): MR1449742
Zentralblatt MATH: 0896.62063
Digital Object Identifier: doi:10.1007/s003579900004 - [40] Tang, W., Lu, Z. and Dhillon, I. S. (2009). Clustering with multiple graphs. In, Proceedings of the 9th IEEE International Conference on Data Mining 1016–1021.
- [41] Taskar, B., Segal, E. and Koller, D. (2001). Probabilistic classification and clustering in relational data. In, Proceedings of the 17th International Joint Conference on Artificial Intelligence 870–876.
- [42] Van Erven, T. and Harremoës, P. (2014). Rényi divergence and Kullback-Leibler divergence., IEEE Transactions on Information Theory 60 3797–3820.Mathematical Reviews (MathSciNet): MR3225930
Digital Object Identifier: doi:10.1109/TIT.2014.2320500 - [43] Zhang, A. Y. and Zhou, H. H. (2015). Minimax rates of community detection in stochastic block model., arXiv preprint arXiv:1507.05313.Mathematical Reviews (MathSciNet): MR3546450
Zentralblatt MATH: 06654468
Digital Object Identifier: doi:10.1214/15-AOS1428
Project Euclid: euclid.aos/1473685275 - [44] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models., Ann. Statist 40 2266–2292.Mathematical Reviews (MathSciNet): MR3059083
Zentralblatt MATH: 1257.62095
Digital Object Identifier: doi:10.1214/12-AOS1036
Project Euclid: euclid.aos/1358951382
The Institute of Mathematical Statistics and the Bernoulli Society

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Bayesian degree-corrected stochastic blockmodels for community detection
Peng, Lijun and Carvalho, Luis, Electronic Journal of Statistics, 2016 - Spectral clustering and the high-dimensional stochastic blockmodel
Rohe, Karl, Chatterjee, Sourav, and Yu, Bin, The Annals of Statistics, 2011 - Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters
Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney, Michael W., Internet Mathematics, 2009
- Bayesian degree-corrected stochastic blockmodels for community detection
Peng, Lijun and Carvalho, Luis, Electronic Journal of Statistics, 2016 - Spectral clustering and the high-dimensional stochastic blockmodel
Rohe, Karl, Chatterjee, Sourav, and Yu, Bin, The Annals of Statistics, 2011 - Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters
Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney, Michael W., Internet Mathematics, 2009 - On multi-view learning with additive
models
Culp, Mark, Michailidis, George, and Johnson, Kjell, The Annals of Applied Statistics, 2009 - Robust and computationally feasible community detection in the presence of arbitrary outlier nodes
Cai, T. Tony and Li, Xiaodong, The Annals of Statistics, 2015 - Parallel Dynamical Systems over Graphs and Related Topics: A Survey
Aledo, Juan A., Martinez, Silvia, and Valverde, Jose C., Journal of Applied Mathematics, 2015 - Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels
Bickel, Peter, Choi, David, Chang, Xiangyu, and Zhang, Hai, The Annals of Statistics, 2013 - Inferring gene–gene interactions and functional modules using sparse canonical correlation analysis
Wang, Y. X. Rachel, Jiang, Keni, Feldman, Lewis J., Bickel, Peter J., and Huang, Haiyan, The Annals of Applied Statistics, 2015 - Sparse median graphs estimation in a high-dimensional semiparametric model
Han, Fang, Han, Xiaoyan, Liu, Han, and Caffo, Brian, The Annals of Applied Statistics, 2016 - Consistency of community detection in networks under degree-corrected stochastic block models
Zhao, Yunpeng, Levina, Elizaveta, and Zhu, Ji, The Annals of Statistics, 2012