Nature | Letter
Link communities reveal multiscale complexity in networks
- Journal name:
- Nature
- Volume:
- 466,
- Pages:
- 761–764
- Date published:
- DOI:
- doi:10.1038/nature09182
- Received
- Accepted
- Published online
Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society1, 2, 3. One crucial step when studying the structure and dynamics of networks is to identify communities4, 5: groups of related nodes that correspond to functional subunits such as protein complexes6, 7 or social spheres8, 9, 10. Communities in networks often overlap9, 10 such that nodes simultaneously belong to several groups. Meanwhile, many networks are known to possess hierarchical organization, where communities are recursively grouped into a hierarchical structure11, 12, 13. However, the fact that many real networks have communities with pervasive overlap, where each and every node belongs to more than one group, has the consequence that a global hierarchy of nodes cannot capture the relationships between overlapping groups. Here we reinvent communities as groups of links rather than nodes and show that this unorthodox approach successfully reconciles the antagonistic organizing principles of overlapping communities and hierarchy. In contrast to the existing literature, which has entirely focused on grouping nodes, link communities naturally incorporate overlap while revealing hierarchical organization. We find relevant link communities in many networks, including major biological networks such as protein–protein interaction6, 7, 14 and metabolic networks11, 15, 16, and show that a large social network10, 17, 18 contains hierarchically organized community structures spanning inner-city to regional scales while maintaining pervasive overlap. Our results imply that link communities are fundamental building blocks that reveal overlap and hierarchical organization in networks to be two aspects of the same phenomenon.
Subject terms:
At a glance
Figures
-
Figure 1: Overlapping communities lead to dense networks and prevent the discovery of a single node hierarchy. a, Local structure in many networks is simple: an individual node sees the communities it belongs to. b, Complex global structure emerges when every node is in the situation displayed in a. c, Pervasive overlap hinders the discovery of hierarchical organization because nodes cannot occupy multiple leaves of a node dendrogram, preventing a single tree from encoding the full hierarchy. d, e, An example showing link communities (colours in d), the link similarity matrix (e; darker entries show more similar pairs of links) and the link dendrogram (e). f, Link communities from the full word association network around the word ‘Newton’. Link colours represent communities and filled regions provide a guide for the eye. Link communities capture concepts related to science and allow substantial overlap. Note that the words were produced by experiment participants during free word associations.
-
Figure 2: Assessing the relevance of link communities using real-world networks. Composite performance (Methods and Supplementary Information) is a data-driven measure of the quality (relevance of discovered memberships) and coverage (fraction of network classified) of community and overlap. Tested algorithms are link clustering, introduced here; clique percolation9; greedy modularity optimization26; and Infomap21. Test networks were chosen for their varied sizes and topologies and to represent the different domains where network analysis is used. Shown for each are the number of nodes, N, and the average number of neighbours per node,
k
. Link clustering finds the most relevant community structure in real-world networks. AP/MS, affinity-purification/mass spectrometry; LC, literature curated; PPI, protein–protein interaction; Y2H, yeast two-hybrid.
-
Figure 3: Community and membership distributions for the metabolic and mobile phone networks. The distribution of community sizes and node memberships (insets). Community size shows a heavy tail. The number of memberships per node is reasonable for both networks: we do not observe phone users that belong to large numbers of communities and we correctly identify currency metabolites, such as water, ATP and inorganic phosphate (Pi), that are prevalently used throughout metabolism. The appearance of currency metabolites in many metabolic reactions is naturally incorporated into link communities, whereas their presence hindered community identification in previous work11, 15.
-
Figure 4: Meaningful communities at multiple levels of the link dendrogram. a–c, The social network of mobile phone users displays co-located, overlapping communities on multiple scales. a, Heat map of the most likely locations of all users in the region, showing several cities. b, Cutting the dendrogram above the optimum threshold yields small, intra-city communities (insets). c, Below the optimum threshold, the largest communities become spatially extended but still show correlation. d, The social network within the largest community in c, with its largest subcommunity highlighted. The highlighted subcommunity is shown along with its link dendrogram and partition density, D, as a function of threshold, t. Link colours correspond to dendrogram branches. e, Community quality, Q, as a function of dendrogram level, compared with random control (Methods).