Abstract:
We
derive sharp thresholds for exact recovery of communities in a weighted
stochastic block model, where observations are collected in the form of
a weighted adjacency matrix, and the weight of each edge is generated
independently from a distribution determined by the community membership
of its endpoints. Our main result, characterizing the precise boundary
between success and failure of maximum likelihood estimation when edge
weights are drawn from discrete distributions, involves the Renyi
divergence of order 1/2 between the distributions of within-community
and between-community edges. When the Renyi divergence is above a
certain threshold, meaning the edge distributions are sufficiently
separated, maximum likelihood succeeds with probability tending to 1;
when the Renyi divergence is below the threshold, maximum likelihood
fails with probability bounded away from 0. In the language of graphical
channels, the Renyi divergence pinpoints the information-theoretic
capacity of discrete graphical channels with binary inputs. Our results
generalize previously established thresholds derived specifically for
unweighted block models, and support an important natural intuition
relating the intrinsic hardness of community estimation to the problem
of edge classification. Along the way, we establish a general
relationship between the Renyi divergence and the probability of success
of the maximum likelihood estimator for arbitrary edge weight
distributions. Finally, we discuss consequences of our bounds for the
related problems of censored block models and submatrix localization,
which may be seen as special cases of the framework developed in our
paper.