We gratefully acknowledge support from
the Simons Foundation
and Allianz der deutschen Wissenschaftsorganisationen, koordiniert durch TIB, MPG und HGF
Full-text links:

Download:

Current browse context:

math.PR

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo Facebook logo LinkedIn logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Probability

Title: Global and Local Information in Clustering Labeled Block Models

Abstract: The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and intercluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborov\'a, based on ideas from statistical physics, predicted a specific threshold for clustering. This conjecture was proved recently by Mossel, Neeman and Sly, and partially independently by Massouli\'e.
In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect reconstruction. In the regime where the number of clusters is large, we show that even an arbitrarily small fraction of labeled nodes allows efficient local clustering even below the conjectured algorithmic threshold in the standard model.
Our results further show that even a vanishing fraction of labeled nodes allows one to break various algorithmic symmetries that exist in the unlabeled block model. In particular, it allows efficient reconstruction and identification of true cluster labels using a local algorithm.
Comments: 20 pages, 2 figures
Subjects: Probability (math.PR); Social and Information Networks (cs.SI)
Cite as: arXiv:1404.6325 [math.PR]
  (or arXiv:1404.6325v1 [math.PR] for this version)

Submission history

From: Varun Kanade [view email]
[v1] Fri, 25 Apr 2014 05:31:16 GMT (113kb,D)