• Institution: Staats- und Universitaetsbibliothek Bremen
  • Research in anthropology including biological and physical, as well as cultural anthropology.
  • Sign up for PNAS eTOC alerts

Spectral redemption in clustering sparse networks

  1. Pan Zhanga,c
  1. Edited* by Peter J. Bickel, University of California, Berkeley, CA, and approved October 23, 2013 (received for review July 2, 2013)

Significance

Spectral algorithms are widely applied to data clustering problems, including finding communities or partitions in graphs and networks. We propose a way of encoding sparse data using a “nonbacktracking” matrix, and show that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model. This is in contrast with classical spectral algorithms, based on the adjacency matrix, random walk matrix, and graph Laplacian, which perform poorly in the sparse case, failing significantly above a recently discovered phase transition for the detectability of communities. Further support for the method is provided by experiments on real networks as well as by theoretical arguments and analogies from probability theory, statistical physics, and the theory of random matrices.

Abstract

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

Footnotes

  • Author contributions: F.K., C.M., E.M., J.N., A.S., L.Z., and P.Z. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper. The authors are listed in alphabetical order.

  • The authors declare no conflict of interest.

  • *This Direct Submission article had a prearranged editor.

Freely available online through the PNAS open access option.