Spectral redemption in clustering sparse networks
- Florent Krzakalaa,b,
- Cristopher Moorec,
- Elchanan Mosseld,1,
- Joe Neemand,
- Allan Slyd,
- Lenka Zdeborováe, and
- Pan Zhanga,c
-
Edited* by Peter J. Bickel, University of California, Berkeley, CA, and approved October 23, 2013 (received for review July 2, 2013)
Significance
Spectral algorithms are widely applied to data clustering problems, including finding communities or partitions in graphs and networks. We propose a way of encoding sparse data using a “nonbacktracking” matrix, and show that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model. This is in contrast with classical spectral algorithms, based on the adjacency matrix, random walk matrix, and graph Laplacian, which perform poorly in the sparse case, failing significantly above a recently discovered phase transition for the detectability of communities. Further support for the method is provided by experiments on real networks as well as by theoretical arguments and analogies from probability theory, statistical physics, and the theory of random matrices.
Abstract
Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.
Footnotes
- ↵1To whom correspondence should be addressed. E-mail: mossel@stat.berkeley.edu.
-
Author contributions: F.K., C.M., E.M., J.N., A.S., L.Z., and P.Z. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper. The authors are listed in alphabetical order.
-
The authors declare no conflict of interest.
-
↵*This Direct Submission article had a prearranged editor.
Freely available online through the PNAS open access option.