Community detection in networks: A user guide


Abstract

Community detection in networks is one of the most popular topics of modern network science. Communities, or clusters, are usually groups of vertices having higher probability of being connected to each other than to members of other groups, though other patterns are possible. Identifying communities is an ill-defined problem. There are no universal protocols on the fundamental ingredients, like the definition of community itself, nor on other crucial issues, like the validation of algorithms and the comparison of their performances. This has generated a number of confusions and misconceptions, which undermine the progress in the field. We offer a guided tour through the main aspects of the problem. We also point out strengths and weaknesses of popular methods, and give directions to their use.

Keywords

  • Networks;
  • Communities;
  • Clustering

1. Introduction

The science of networks is a modern discipline spanning the natural, social and computer sciences, as well as engineering  [1], [2], [3], [4], [5], [6] and [7]. Networks, or graphs, consist of vertices and edges. An edge typically connects a pair of vertices. 1 Networks occur in an huge variety of contexts. Facebook, for instance, is a large social network, where more than one billion people are connected via virtual acquaintanceships. Another famous example is the Internet, the physical network of computers, routers and modems which are linked via cables or wireless signals (Fig. 1). Many other examples come from biology, physics, economics, engineering, computer science, ecology, marketing, social and political sciences, etc..

Most networks of interest display community structure, i.e., their vertices are organised into groups, called communities, clusters or modules. In Fig. 2 we show a collaboration network of scientists working at the Santa Fe Institute (SFI) in Santa Fe, New Mexico. Vertices are scientists, edges join coauthors. Edges are concentrated within groups of vertices representing scientists working on the same research topic, where collaborations are more natural. Likewise, communities could represent proteins with similar function in protein-protein interaction networks, groups of friends in social networks, websites on similar topics on the Web graph, and so on.

Collaboration network of scientists working at the Santa Fe Institute (SFI). ...
Fig. 2. 

Collaboration network of scientists working at the Santa Fe Institute (SFI). Edges connect scientists that have coauthored at least one paper. Symbols indicate the research areas of the scientists. Naturally, there are more edges between scholars working on the same area than between scholars working in different areas.

Reprinted figure with permission from  [8].

Identifying communities may offer insight on how the network is organised. It allows us to focus on regions having some degree of autonomy within the graph. It helps to classify the vertices, based on their role with respect to the communities they belong to. For instance we can distinguish vertices totally embedded within their clusters from vertices at the boundary of the clusters, which may act as brokers between the modules and, in that case, could play a major role both in holding the modules together and in the dynamics of spreading processes across the network.

Community detection in networks, also called graph or network clustering, is an ill-defined problem though. There is no universal definition of the objects that one should be looking for. Consequently, there are no clear-cut guidelines on how to assess the performance of different algorithms and how to compare them with each other. On the one hand, such ambiguity leaves a lot of freedom to propose diverse approaches to the problem, which often depend on the specific research question and (or) the particular system at study. On the other hand, it has introduced a lot of noise into the field, slowing down progress. In particular, it has favoured the diffusion of questionable concepts and convictions, on which a large number of methods are based.

This work presents a critical analysis of the problem of community detection, intended to practitioners but accessible to readers with basic notions of network science. It is not meant to be an exhaustive survey. The focus is on the general aspects of the problem, especially in the light of recent findings. Also, we discuss some popular classes of algorithms and give advice on their usage. More info on network clustering can be found in several review articles  [9], [10], [11], [12], [13], [14], [15], [16] and [17].

Note to users:
Accepted manuscripts are Articles in Press that have been peer reviewed and accepted for publication by the Editorial Board of this publication. They have not yet been copy edited and/or formatted in the publication house style, and may not yet have the full ScienceDirect functionality, e.g., supplementary files may still need to be added, links to references may not resolve yet etc. The text could still change before final publication.

Although accepted manuscripts do not have all bibliographic details available yet, they can already be cited using the year of online publication and the DOI, as follows: author(s), article title, Publication (year), DOI. Please consult the journal's reference style for the exact appearance of these elements, abbreviation of journal names and use of punctuation.

When the final article is assigned to volumes/issues of the Publication, the Article in Press version will be removed and the final version will appear in the associated published volumes/issues of the Publication. The date the article was first made available online will be carried over.