References & Citations
Statistics > Methodology
Title: Network Cross-Validation for Determining the Number of Communities in Network Data
(Submitted on 6 Nov 2014)
Abstract: The stochastic block model and its variants have been a popular tool in analyzing large network data with community structures. Model selection for these network models, such as determining the number of communities, has been a challenging statistical inference task. In this paper we develop an efficient cross-validation approach to determine the number of communities, as well as to choose between the regular stochastic block model and the degree corrected block model. Our method, called network cross-validation, is based on a block-wise edge splitting technique, combined with an integrated step of community recovery using sub-blocks of the adjacency matrix. The solid performance of our method is supported by theoretical analysis of the sub-block parameter estimation, and is demonstrated in extensive simulations and a data example. Extensions to more general network models are also discussed.