This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Close this notification
NOTICE: Ensuring subscriber access to content on IOPscience throughout the coronavirus outbreak - see our remote access guidelines.

Click here to close this overlay, or press the "Escape" key on your keyboard.

The Deutsche Physikalische Gesellschaft (DPG) with a tradition extending back to 1845 is the largest physical society in the world with more than 61,000 members. The DPG sees itself as the forum and mouthpiece for physics and is a non-profit organisation that does not pursue financial interests. It supports the sharing of ideas and thoughts within the scientific community, fosters physics teaching and would also like to open a window to physics for all those with a healthy curiosity.

http://www.dpg-physik.de

Click here to close this overlay, or press the "Escape" key on your keyboard.

The Institute of Physics (IOP) is a leading scientific society promoting physics and bringing physicists together for the benefit of all. It has a worldwide membership of around 50 000 comprising physicists from all sectors, as well as those with an interest in physics. It works to advance physics research, application and education; and engages with policy makers and the public to develop awareness and understanding of physics. Its publishing company, IOP Publishing, is a world leader in professional scientific communications.

https://www.iop.org

Analysis of the structure of complex networks at different resolution levels

, and

Published 29 May 2008 IOP Publishing and Deutsche Physikalische Gesellschaft
, ,

1367-2630/10/5/053039

Abstract

Modular structure is ubiquitous in real-world complex networks, and its detection is important because it gives insights into the structure–functionality relationship. The standard approach is based on the optimization of a quality function, modularity, which is a relative quality measure for the partition of a network into modules. Recently, some authors (Fortunato and Barthélemy 2007 Proc. Natl Acad. Sci. USA 104 36 and Kumpula et al 2007 Eur. Phys. J. B 56 41) have pointed out that the optimization of modularity has a fundamental drawback: the existence of a resolution limit beyond which no modular structure can be detected even though these modules might have their own entity. The reason is that several topological descriptions of the network coexist at different scales, which is, in general, a fingerprint of complex systems. Here, we propose a method that allows for multiple resolution screening of the modular structure. The method has been validated using synthetic networks, discovering the predefined structures at all scales. Its application to two real social networks allows us to find the exact splits reported in the literature, as well as the substructure beyond the actual split.

Export citation and abstract BibTeX RIS

1. Introduction

In 2002, Girvan and Newman [1] highlighted the relevance of community structure in complex networks and proposed a method to detect it. This work opened a new scenario that has attracted a great deal of attention in recent years [2, 3], especially because they identified structures which have meaning; they revealed information about the roles of groups of nodes. This is the case, for example, in the worldwide airports network [4], the WWW [5], biological networks [6]–[8], social networks [19] and the Internet [10, 11], among others. The information revealed by the community structure of real networks can be very valuable and make scientists aware of accuracy and reliability of the method used to detect this substructure.

The most important advance about community detection from the previous hit [1] was given by the same authors [12], proposing a quality measure, modularity (Q), that allows quantification of the modular structure. Given a network partitioned into communities or modules, Ci being the community to which node i is assigned, the mathematical definition of modularity [13] is expressed in terms of the weighted adjacency matrix wij, which represents the value of the weight in the link between the nodes i and j (0 if no link exists), and the strengths as

Equation (1.1)

where the Kronecker delta function δ(Ci, Cj) takes the value 1 if the nodes i and j are into the same module, 0 otherwise, and the total strength is . For unweighted networks, wi becomes the degree of node i, and w the total number of links of the network.

The modularity of a given partition is the probability of having edges falling within modules in the network minus the expected probability in an equivalent (null case) network with the same number of nodes, and edges placed at random preserving the nodes' strength. The larger the modularity, the better the partitioning is, because the more it deviates from the null case. Note that the optimization of the modularity cannot be performed by exhaustive search since the number of different partitions is equal to the Bell numbers [14], which grow at least exponentially in the number of nodes N. Heuristics for the optimization of modularity [15]–[20] has become the only feasible (in computational time) and accurate method to detect the modular structure to date.

Recently, Fortunato and Barthélemy [21] showed mathematically that the optimization of modularity has a resolution limit, raising important concerns about the reliability of the modules detected so far using this technique, or eventually using any other quality function. Using a definition of a module extracted from the functional form of (1.1), they subscribe to the possible existence of undetectable submodules within the modules obtained by optimizing (1.1). The same limitation has been observed for other quality functions different from modularity [22]. Note that the resolution limit statement could also be pointing out the existence of multiple scales of description in terms of community structure, each one with its own importance. This is precisely the idea that will drive our work.

Here, we address the issue of community detection in two ways: firstly, clarifying the conceptual interpretation of the resolution limit, not as a problem but as a feature of quality functions that can help us understand in depth the structure of networks, and secondly and most importantly, we provide a method that allows the full screening of the topological structure at any resolution level using the original definition of Q. Once we present the method, we will compare it with recent approaches for exploring, also, the substructure of networks.

2. Complex networks topology represented at different scales

2.1. Resolution limit and topological scales

Rewriting (1.1) in terms of contribution of modules instead of nodes, we have

Equation (2.1)

where the sum is over the m modules of the partition, wss is the internal strength of module s and ws is the total strength of the module s. For unweighted networks, wss reduces to the number of internal links and ws to the sum of the degrees of the nodes in module s.

The solution we propose takes advantage of the dependence of the resolution limit on the total strength 2w. Consider the case study consisting of two identical modules with a single link connecting them to the rest of the network and only one link connecting them to each other [21]; the resolution limit states that these modules will not be found, optimizing modularity, if their internal strengths are

Equation (2.2)

In [21], the authors neglect the contribution −1 in the second side of the inequality (2.2), which is acceptable for large values of the total strength.

Our proposal to solve this problem is to modify the total strength 2w. Let us assume that we increase the strength of every node by a quantity say r, then (2.2) will read

Equation (2.3)

where ns stands for the number of nodes in module s, and N stands for the number of nodes in the network. The result of this prescription resulting in (2.3) is that by rescaling the topology by a factor r, the above example can be separated by optimizing modularity, because the growth of is slower than that of r, i.e. at some scale controlled by r both modules will be visible using optimal modularity.

The problem now is how to increase the strength of the nodes without altering the topological characteristics of the original network. We solve this problem by rescaling the topology defining Wr, from the original weighted adjacency matrix W of the graph with the entries wij, as follows:

Equation (2.4)

where I is the identity matrix. In terms of graphs, this new matrix represents the original network with self-loops of weight r for every node. Note that the prescription in (2.4) supposes a constant shift (translation) r of the strength of each node.

The commonly analyzed structural characteristics of networks (strength distribution, weighted clustering coefficient, strength correlations of any order, etc) remain the same in the new network, because the translation of strengths does not affect the original links' weights wij that are the building blocks of the topology. The shift only affects the property of each node individually and in the same way for all of them. The spectra of the original graph are also shifted by a quantity r for each eigenvalue, preserving then any property that depends on differences between eigenvalues. The eigenvectors are exactly the same. Finally, the associated Laplacian matrix of the original matrix Lij=wiδijwij, responsible for the behavior of linear dynamical processes on the network [23], is also unchanged.

The interesting property of the rescaled topology Wr is that its characteristic scale in terms of modularity has changed. Then the topological structure revealed by optimizing the modularity for Wr is that of large groups for small values of r, and smaller groups for large values of r, all of which are strictly embedded in the original topology. This fact allows for the screening of the modular structure by analyzing the optimal modular structure of Wr for different values of r. Note that the rescaling of the topology is simply an elegant way to enhance the total strength of the network, without varying its topological properties; then the rescaling can be used, in principle, to analyze the structure of networks using any quality function at different resolution levels parametrized by r.

2.2. Multiple resolution method

The analysis of modules at different resolution levels that we propose, consists of optimizing the modularity of the graph Wr for different values of r. Denoting by Qr the modularity of the network at scale r, the expression equivalent to (2.1) reads

Equation (2.5)

The topological scale determined by maximizing Q at which the detection of the modular structure has been attacked so far, corresponds to r=0. For positive values of r, we have access to the substructures underneath those at r=0, and for negative values of r we have access to the superstructures. The topological scale corresponding to all nodes separated (forming their own communities) is found by maximizing , where is the smallest positive value of r that satisfies

And the topological scale corresponding to a unique module formed by the whole network is found by maximizing , where has a lower bound defined by the asymptote ; for a detailed analysis, see appendix A. At the asymptote, the total strength is zero; thus no meaningful scales can be found for values of r below it. Note that the average strength can be written as . To compare the results at different resolutions, we adopt the usual formulation in other areas of physics (optics, acoustics, etc), where scales are prescribed as the logarithm of the ratio between the relevant parameters. Here, the difference between the scales is measured as the logarithm of the ratio between strengths

In this new description, we have that a module is defined at each scale of description r, as a result of the maximization of Qr. Moreover, modules that exist at a certain level of description may disappear from our observation when changing the scale r, while others arise. Note that nothing indicates that the substructures to which we will have access at different resolution levels are necessarily hierarchical; indeed, in general they will not be hierarchical. Although, in principle, all resolution scales provide some information about the topology, and are important, the detection of partitions that are more persistent than the rest when changing the resolution r is indicative of a tougher modular structure.

3. Results

We show the results of our method investigating the modular structure at multiple resolution levels (different scales), for examples of synthetic and real complex networks. A first approach on synthetic networks is illustrative of the validation of the procedure when different coexistent topological scales are imposed by construction. We have also analyzed the modular structure of real networks. In general, in real cases, the results are more difficult to assess because nothing from the topology indicates the existence a priori of a more relevant structure in the network, and only the corroboration a posteriori of the structure found with known facts about the (social, biological, etc) meaning of it can give reliability to any method. In the experiments, we have studied between 100 and 500 values of r inside the interval (rasymp, rmax] for synthetic networks, and 1000 values of r for real networks. All the experiments have been cross checked using two modularity-optimization heuristics: extremal optimization [18], and a new proposal for the optimization of modularity based on tabu search (see appendix B for details), repeating each one 20 times and keeping the partition obtained at the optimal value of Qr.

In figure 1, we have screened the whole range of topological scales for three synthetic networks, representing the number of modules obtained at the optimal partition for Qr, and the network analyzed highlighting the partition at two representative scales indicated by (I) and (II). Although the networks studied may have more than two relevant scales, we have just drawn two of them chosen among the most representative ones. First, we have computed the modular structure in a hierarchical scale-free network with 125 nodes, RB 125, proposed by Ravasz and Barabasi [24]. In figure 1(a), we plot the modular structure found, which shows three different scales that deserve discussion. We observe clearly persistent structures in 5 and 25 communities, respectively, that account for the subdivisions more significant in the process, showing two hierarchical levels for the structure corresponding to the network design. Additionally, the most stable partition in terms of resolution does not correspond to any of the previous ones, but corresponds to the partition in 26 modules (the same as the one in 25 modules, but isolating the main hub). The partition in 5 modules and the partition with 26 modules are highlighted on the original network (note that the partition in 25 modules, left of the one with 26 modules, is not highlighted, but this does not mean that the structure is less important since it also showed high stability). This result is in perfect correspondence to the synchronization patterns produced on this network using coupled oscillators [23]. It is worth mentioning here that we do not claim that the most stable partition is the unique one that represents the community structure better; we want to point out that all the partitions that are stable at a relatively wide range of scales are of importance (see section 4.3 for an in-depth discussion).

Figure 1.

Figure 1. Multiple resolution of modular structure in synthetic networks. Left: the number of modules obtained at the optimal partition for Qr, where each point corresponds to a different partition and the arrows indicate the optimal partitions at r=0. Right: networks analyzed, highlighting the partitions at two representative scales indicated by (I) and (II); other non-highlighted partitions are discussed in the main text. (a) RB 125 corresponds to the hierarchical scale-free network proposed in [24]. The regions corresponding to 5, 25 and 26 modules are the most representative (stable) in terms of resolution. (b) H 13-4 corresponds to a homogeneous in degree network with two predefined hierarchical levels. Both levels are revealed by the method at different scales. (c) FB corresponds to the network proposed in [21] to demonstrate the resolution limit of modularity (at r=0). This limit is overcome at scale (II) providing us with the partition expected.

Standard Export PowerPoint slide

Another network example used is the H 13-4 network [23], which corresponds to a homogeneous in degree network with two predefined hierarchical levels, the number of nodes being 256, the number of links of each node with the most internal community being 13 (formed by 16 nodes), the number of links with the most external community being 4 (four groups of 64 nodes), and the number of links with any other node at random in the network being 1. In figure 1(b), we represent the network and its corresponding modular structure at different scales. Both the hierarchical levels are revealed by the method as they correspond to the original construction of the network: the first hierarchical level consists of 4 groups of 64 nodes, and the second level consists of 16 groups of 16 nodes.

Finally, we have used the FB network proposed by Fortunato and Barthélemy [21] to demonstrate the resolution limit of modularity (at r=0). It consists of two cliques of 20 nodes linked with two small cliques of 5 nodes. At r=0, the best partition cannot separate the two small cliques. In figure 1(c), we observe that the partition searched for by the authors, formed by the four cliques isolated in their own communities, is obtained by increasing the resolution r, showing that the resolution limit of modularity is overcome by the method in region (II). Note that the region left of (I) in the figure corresponds to two groups, the first one formed by joining both squares to its adjacent circle, and the other being the remaining circle. Also to the right of region (II) there appears a relatively stable plateau corresponding to the disentanglement in 42 groups, resulting from the individualization of the nodes in the circles and the maintenance of the two squared cliques.

We have also studied a couple of social networks for which explicit knowledge about their modular structures is available, see figure 2. These particular networks, formed by social acquaintances between individuals, have the main characteristic that after a period of study they get decomposed into perfectly identifiable parts. The challenge is to find the modular structure of these parts without previous knowledge of the real partition. The optimization of modularity at r=0 fails to provide this information, and no other method has been able to find the real partitioned structure. However, the most representative scales in terms of resolution optimizing Qr obtained by applying our method correspond exactly to the real splits.

Figure 2.

Figure 2. Multiple resolution of the modular structure in real networks. Top: the number of modules in the optimal partitions at different scales; the arrows indicate the best partitions obtained at r=0, which do not correspond to the real partitions. Bottom: representation of the networks and the partitions in the plateaus marked (I), which correspond both to the most stable scales of description and to the known splits that have occurred in the real networks. Other stable partitions are not discussed here because we do not have actual social information about other groupings in these networks. (a) Zachary's karate club network [25]. (b) Dolphin social network by Lusseau et al [26].

Standard Export PowerPoint slide

First, we have investigated the classical social network of Zachary's karate club [25], accounting for the study over 2 years of the friendships between 34 members of a karate club at a US university in 1970. The network in question was divided, at the end of the study period, into two groups after a dispute between the club's administrator and the club's instructor, which ultimately resulted in the instructor leaving there and starting a new club, taking about half of the original club's members with him. The analysis of these data has been a paradigmatic benchmark to test the accuracy of community-detection algorithms. Zachary constructed a weighted network using different social measures, see figure 2(a), although in the physics literature, the network has very often been considered unweighted for simplicity or tradition, missing important information in the process.

The goal of any community-detection algorithm trying to identify modules on this network should be to find the actual split that has occurred, assigning perfectly the nodes to the two known resulting clubs. The first approach to this goal was given by Girvan and Newman in [1], where they used a divisive method that produced a hierarchical tree representing the whole modular structure. They found that the first network-splitting obtained by the method correctly assigned all the nodes except node number 3. However, no measure of the quality of the partition was introduced at that time, and then all levels of the hierarchical tree were equivalent, with no way to have a preference for any partition. In [12], the same authors introduced the modularity measure Q and reported that the best structure in the hierarchy, in terms of the value of Q, was a partition into four groups and not two as expected. From this point on, many authors have analyzed this network and have provided the best values of Q obtained. Today, it is well accepted that the best partition in terms of modularity of the Zachary's unweighted network is achieved for four groups with the value of Q=0.419. We have applied our method to screen the modular structure of the original weighted network at all resolution scales of r. The results in figure 2(a) show that the most stable level of resolution is precisely the partition resulting in the two groups representing the two clubs, with no mismatch of any individual.

The second network analyzed is the dolphins' social network of Lusseau et al [26]. This network was constructed from observations of a community of 62 bottle-nose dolphins over a period of 7 years from 1994 to 2001. The nodes in the network represent the dolphins, and the ties between the nodes represent the associations between dolphin pairs occurring more often than expected by chance. There is evidence [27] that a temporary disappearance of a dolphin denoted SN100 led to the fission of the dolphin community into two identifiable parts shown in figure 2(b). The optimization of modularity at r=0 does not produce the expected split, but a partition into five communities with Q=0.518. Other approaches such as the one exposed in [28] have also not succeeded in finding the real division. Our method allows us to reveal all the modular structures in the whole range of resolution, indicating that the most stable solution in terms of resolution of optimal Qr corresponds exactly to the two partitions observed in this animal social network.

4. Discussion

The variation of topological scales, so far, is mediated by the variation of the parameter r. The meaning of this parameter is that of a resistance to become part of a community, in the scope of modularity. For positive values of r, we have access to the substructures below those at r=0 (corresponding to the original scale at which modularity was defined by Newman), and for negative values of r we have access to the superstructures. The screening of different scales of descriptions should be useful to get deeper in the understanding of complex networks. Here, we present a discussion about the role of the different topological scales beyond their static definition, revealing their implications in dynamic processes on top of networks. After that, we will also compare our method with another possible approach to the mesoscale, and finally, we give a perspective on the significance of the mesoscale in contrast with the commonly accepted one-scale description.

4.1. The contact with physics

The results show that there exist several intermediate scales of description of the complex networks, the topological mesoscale. These scales are revealed by intervals of values of the resistance r, for which the optimal partition does not change, see figure 1. The obvious question at this point is: what are these scales representative of? The answer to this question is not trivial, and is intrinsically related to the functioning of the complex network as a substrate for different dynamical processes, communication and friendship in social networks, cognitive task in neural networks, or different levels of aggregation of computers in the Internet, for example. Our guess is that a simple dynamical process on top of a complex network should somehow reveal the topological mesoscale also in terms of temporal patterns. To check this hypothesis, we have implemented a synchronization dynamics on top of different topologies following [2329]. The dynamics corresponds to the non-linear interaction between oscillators connected following the links of the complex networks. Analyzing the temporal meta-stable patterns emerging in the evolution toward complete synchronization, we corroborate our initial guess.

The temporal mesoscale of the dynamics of synchronization (of phase oscillators) near the synchronization attractor is governed by the solutions of the linear dynamics:

Equation (4.1)

where k is a constant, θj are the phases of the nodes and Lij is the Laplacian matrix of the network.

To identify patterns of synchronization in time, we use [23] a discretization of the matrix where langcenterdotcenterdotcenterdotrang stands for the average over different realizations of the initial conditions. In all the cases presented here, we have averaged 105 realizations, and used a discretization threshold of 0.999. We observe that the intermediate scales that are revealed by the synchronization process are in agreement with those found by the topological method proposed here. The method allows not only to identify the number of communities at different scales, but also to determine which nodes form these communities.

We show the corroboration of these claims in a set of synthetic networks, where the modular structure at different scales is imposed by construction. In figure 3, we sketch first the topology of a simple model of the hierarchical network [24], and the comparison between the specific communities found at different resolution levels, and the synchronization patterns observed in the path toward synchronization. The synthetic network of 25 nodes used combines the scale-free property with a high clustering coefficient, and can be iterated, following the scheme plotted in figure 3(a), to have many hierarchical levels. The results of the comparison reveal a strong equivalence between both the processes, the static resolution method at different scales (different values of the resistance), and the groups of synchronized nodes in time. In figure 4, we extend this comparison to three more network structures: H 13-4 corresponding to the homogeneous in degree network described in section 3; equivalently, the H 15-2 network [23] corresponds to a homogeneous in degree network with two predefined hierarchical levels, the number of nodes being 256, the number of links of each node with the most internal community being 15, the number of links with the most external community being 2, and the number of links with any other node at random in the network being 1; the RB 125 network has also been used in section 3 and corresponds to the same scheme exposed in figure 3(a) adding a new hierarchical level. The plots here represent, in log–log scale, the number of communities as a function of the translated resistance , and time. The correspondence between the patterns is highlighted, and again the correspondence is overwhelming.

Figure 3.

Figure 3. The network structure (a) corresponds to the hierarchical network proposed by Ravasz and Barabasi (see text for details). In (b), communities found at different scales for the network depicted. The left arrow represents the value of the resistance for which these structures prevail. The right arrow stands for the time intervals for which the same structures are found in a synchronization process (see text for details). For large positive values of r the network is decomposed into individual nodes, while for large negative values of r the whole network forms a single community.

Standard Export PowerPoint slide
Figure 4.

Figure 4. Comparison between topological scales and dynamical scales of synchronization. The plots represent, in log–log scale, the number of communities as a function of (a) the translated resistance and (b) time. Dashed red lines are a guide to the eye to emphasize the correspondence between the plateaus observed. Legends refer to the network structure (see the text for details).

Standard Export PowerPoint slide

Obviously, the functioning of real complex networks can rely on dynamical processes very different from the synchronization process exposed here; however, it is still instructive to see how a simple nonlinear process reflects the mesoscale of complex networks, or from another viewpoint, to see how the topology of the networks imposes dynamical (temporal) scales in their functioning.

4.2. Comparison with other methods

Some authors have proposed algorithms to extract the hierarchical organization of complex networks by modifying the objective function [30], or by searching local minima of the modularity landscape [31]. These approaches differ conceptually from ours and also in practice: (i) the modification of the quality function [30] does not always provide the correct substructure of networks; (ii) the method based on the screening of local minima of modularity [31] is designed assuming that the structure is hierarchical, which is not the case in many real networks.

The method proposed by Sales-Pardo et al [31] is specifically designed to unravel the hierarchical structure in networks. The comparison with our method is then not possible within our more general scope of any topology. For hierarchical networks, their method will find the hierarchy of scales, as ours also does; however, for non-hierarchical networks, their method can only produce nested communities, in contrast with ours. Conceptually, the difference can be summarized as follows: hierarchies imply multiple scales of description, but the implication does not hold in reverse.

The method proposed by Reichardt and Bornholdt (R&B) [30] was not designed to avoid the resolution limit of modularity, but to offer a way to connect modularity with statistical physics. The main idea of the authors is to tune the null model (i.e. change the quality function) and then to obtain other partitions by maximizing the new quality functions. Indeed in [22], the authors have interestingly shown that the R&B method has the same resolution problems envisioned in [21] for modularity. The problem is that the R&B method consisting of varying γ (the prefactor that multiplies the null model) is not equivalent to tuning the resolution of Newman's modularity. The authors of [22] recognize that if the size distribution of the communities is broad, like in collaboration networks or school friendship networks, there is no single proper value of γ for the optimal resolution. The main difference from our method is that, whether the size distribution of communities is broad or not, the rescaling of the topology method that we present finds all the topological structures correctly because it is designed to this end5.

To support the above discussion, we have built a toy model network with a simple topology, but difficult for community-detection algorithms because it includes communities of different sizes, some of them sparse and others dense, see figure 5. The network model is small enough to have a clear vision of the modules, and to be attacked with computationally costly techniques in reasonable time. While our method succeeds in the process, the R&B method fails. The results of our method and the R&B method with varying γ are presented in figure 5.

Figure 5.

Figure 5. (a) Toy model network with communities of different sizes and densities. (b) Top: number of modules as a function of the topological scale r using our method; the red line indicates the range of scales for which the natural partition (four stars and clique) is obtained. Bottom: number of modules as a function of the parameter γ in the R&B model; the natural partition (four stars and one clique) is not obtained for any value of γ.

Standard Export PowerPoint slide

It is worth noting that the parameter γ in the R&B approach does not correspond to any value of r. Only when γ=1 and r=0, do both the definitions become equal, and are exactly Newman's original definition. Rewriting Qr in (2.5) in terms of nodes,

Equation (4.2)

and comparing it with the R&B modularity

Equation (4.3)

for both prescriptions to be equivalent for all partitions, one must show that

Equation (4.4)

If ij, the relationship between r and γ becomes

Equation (4.5)

which is only fulfilled for r=0 and γ=1, the trivial case stated earlier, because otherwise one would have different values of γ for each pair of the nodes i and j. Therefore, there is no transformation of Qr into QR&Bγ, and they are screening different things.

To conclude this section, and for the sake of clarification, we would like to state that the hierarchical dendrogram produced using the Girvan and Newman algorithm [1] is not assimilable to any of the procedures described here, namely the Sales-Pardo et al proposal, the R&B method, or our current method. In their seminal paper, Girvan and Newman [1] never state that the dendrogram produces a hierarchy of topological scales, because there is no such effect. Their dendrogram is not designed to optimize any quantity at each level of the tree, which is precisely why they defined modularity as an indicator to select a level of description in the tree that will be representative of the community structure [12]. The Girvan and Newman algorithm cannot produce the scales of description we have screened here, as can be checked by observing the dendrogram for the Zachary Karate club presented in [12].

4.3. Which is the 'best' scale of description of complex networks?

The question about the determination of the 'best' scale of description of a complex network is natural but ill-posed in the current scenario. Throughout the paper, we have stated that the more 'stable' partitions, in terms of persistence maximizing Qr when varying the scales with r, are somehow more relevant in the topological description of the mesoscale. Their existence is an observed fact: some partitions are more persistent than others when changing the resolution scale of the topology. We think that this fact is not surprising, as it is not in many physical systems: phenomena that are observed persistently over a wide range of scales vanish at other scales, and others emerge. In general, these more persistent phenomena are usually more important to understand the system. More stable partitions are relevant in the sense that they usually have a known meaning, but we cannot state that other partitions not so prevalent are uninformative. All of them are embedded in the topology and give their particular information.

Summarizing what we think about the determination of the 'best' or 'more relevant' scale of description, we can say that the existence of relevant scales of description of a complex network should unavoidably pass through the definition of 'relevant'. Throughout the paper, we have never tried to define 'relevant' directly from the results obtained with our method, but a posteriori. We use, in the case of synthetic networks and real networks, the information that we have a priori (e.g. knowledge about the hierarchy imposed by construction, or known splits) to determine which scale is more relevant and then to check whether it is found by the method. What we observe is that these relevant scales are usually related to partitions that are significantly persistent (stable) at different scales (variation of r). However, to invert the argument is not straightforward. It is true that one could invent a function that peaks at the scales we see in reality and that are known, as for example a function that accounts for the homogeneity of the obtained communities, but this inevitably imposes new conditions on the definition of the module. Matching the above discussion, let us expose the following: if such a function that indicates the most 'relevant' scale of description (and then partition) exists, why not use this function as the objective function to optimize? This argument is strong because it implies that to determine if any scale is more important than others, one must optimize a different quality function designed to this end, not modularity.

5. Conclusions

In conclusion, motivated by the recent finding that the optimization of modularity has a resolution limit, related to the characteristic scale imposed by the total strength (sum of weights) of the network, we propose a multiple resolution procedure that allows the optimization of the modularity process to go deep into the structure. The main idea consists of rescaling the topology by defining a new network from the original one, providing each node with a self-loop of the same magnitude r. The new network presents the same characteristics as the original network in terms of connectivity, but allows the search of modules at different topological scales. We have provided examples of the modular substructure found in synthetic and real complex networks. The results are sets of partitions that screen the full range of structural modules from individual nodes up to the whole network in each particular scale.

The analysis of the results reveals that some topological scales are more persistent (stable) in terms of resolution than others. These stable scales provide specific information about the main modular aspects of the structure: in the synthetic networks analyzed, they correspond to the predefined structural scales imposed ad hoc; and in real networks they correspond exactly to previous knowledge about the networks, which has not been recovered by any other method used for studying these network topologies to date. With this method, we release optimization of modularity from resolution problems, and give new ideas about the description of complex networks. The existence of several scales of description of complex networks has deep analogies to the usual study of complex systems in physics, where different models have been formulated at different spatial scales to get insights into different aspects of their phenomenology.

Acknowledgments

This work was supported by the Spanish Ministry of Science and Technology grant FIS2006-13321-C02-02. We acknowledge, for permission to use the resources and for providing technical expertise and assistance, the BSC–CNS supercomputing facility.

Appendix A. Resistance limiting cases

Here, we present mathematical proofs of the physical limiting cases of the resistance for weighted undirected and directed networks, namely the limit of resistance for which all nodes are isolated, and the limit for which all nodes become members of a single group that represents the whole network.

A.1. Resistance limiting cases for weighted undirected networks

Let wij=wji≥0, ij, be the weights of a complex network, where wij=0 if there is no link between nodes i and j. We suppose that this network is connected; otherwise, each connected component should be analyzed one by one. The addition of a common resistance r to all nodes may be understood as the definition of a new network with weights

Equation (A.1)

The strengths of this network are

Equation (A.2)

and its total strength is

Equation (A.3)

Now, the modularity of the new network is calculated as

Equation (A.4)

which may also be written as

Equation (A.5)

where

Equation (A.6)

Note that Dr does not depend on the community partition.

A.1.1. All nodes isolated

If we have

Equation (A.7)

then Qr in (A.5) is maximized when δ(Ci, Cj)=0, ∀ij, i.e. modularity attains its maximum when all the nodes are isolated in clusters of just one node. In terms of resistance, they simply become second-order inequalities,

Equation (A.8)

which can easily be solved for all pairs of nodes joined by an edge. Thus, is the minimum value of r which satisfies all these inequalities, and for all nodes are separated in the optimal community configuration.

A.1.2. All nodes in the same community

Let us analyze the behavior of modularity just to the right of the asymptote . For convenience, we write the resistance as

Equation (A.9)

where ε=epsilon/N and epsilon is a small positive constant.

The first term of modularity in (A.4) can be split in the following way:

Equation (A.10)

where a is the sum of weights of edges connecting different communities. If there are two or more communities, then a>0, otherwise a=0.

The analysis of the second (null case) term of (A.4) requires a communities expansion:

Equation (A.11)

where b>0, and b~O(epsilon2) only if all strengths are equal; on the contrary, b~O(1).

Therefore,

Equation (A.12)

which has asymptotic behavior

Equation (A.13)

This means that, for values of the resistance just above the asymptote, the optimal communities configuration is that with all the nodes together in a single module that corresponds to the whole network.

A.2. Resistance limiting cases for weighted directed networks

Let wij≥0, ij be the weight of an arc that goes from the ith to the jth node, where wij=0 if there is no link between them. We suppose that this network is connected in the weak sense (weak connected components), i.e. the connected components are found as if the arcs were undirected; otherwise, each connected component should be analyzed one by one.

The natural generalization of modularity to cope with directed networks was introduced in [33], and is expressed as

Equation (A.14)

where the input and output strengths of the network are

Equation (A.15)

Equation (A.16)

and its total strength is

Equation (A.17)

The addition of a common resistance r to all the nodes may be understood as the definition of a new network with weights

Equation (A.18)

The strengths of this network are

Equation (A.19)

Equation (A.20)

and its total strength is

Equation (A.21)

Now, the modularity (A.14) of the new network is calculated as

Equation (A.22)

which may also be written as

Equation (A.23)

where

Equation (A.24)

Note that Dr does not depend on the community partition.

A.2.1. All nodes isolated

If we have

Equation (A.25)

then Qr in (A.23) is maximized when δ(Ci, Cj)=0, ∀ij, i.e. modularity attains its maximum when all the nodes are isolated in clusters of just one node. In terms of the resistance, they simply become second-order inequalities,

Equation (A.26)

which can easily be solved for all pairs of nodes joined by an arc. Thus, rmax is the minimum value of r which satisfies all these inequalities, and for r > rmax all nodes are separated in the optimal community configuration.

A.2.2. All nodes in the same community

The analysis of this case follows the same steps as in appendix A.1.2, yielding also (A.12):

Equation (A.27)

The only difference is that now:

Equation (A.28)

Unlike for undirected networks, the value of b is not guaranteed to be positive, and then:

Equation (A.29)

This means that, only if b is positive for all the different community partitions, for values of the resistance just above the asymptote, the optimal communities configuration is that with all nodes together in a single module that corresponds to the whole network. Otherwise, the modularity will rise to + for the maximum modularity configuration, and the single module structure might not be present for any value of the resistance.

Appendix B. Optimization of the modularity using the Tabu heuristic

We propose a new method to optimize the modularity based on Tabu search [34]. The algorithm proceeds as follows: starting from an initial solution (a partition into groups of nodes of the network), S_Init, an iterative process that explores the search space begins, stepping from the solution of the current iteration, S_Iter, to one of its neighbors, S_Neig. The neighborhood is composed of the partitions that are obtained from the current solution by the application of a local operator called move. In our case, the move operator acts on one node at a time, moving it from its current community to another selected at random, or creating a new one. Among the solutions in the neighborhood, the best one is chosen to become the new current solution for the next iteration of the algorithm.

In order to escape from local optima, a list of tabu moves is used. This tabu list stores and forbids the most recently accepted moves and is updated as the algorithm proceeds, so that a move just added to the list is removed from it after a certain number of iterations (Tabu_Tenure) have passed. However, tabu moves are allowed when they lead to an improved solution. Once a solution is accepted, the node moved to obtain this solution is inserted into the tabu list, in order to prevent the movement of the same node during the next Tabu_Tenure iterations, unless this move leads us to the best solution found until that moment. We used a logarithmic function on the number of nodes as the number of idle iterations needed to stop the search.

function Tabu_Modularity_Optimization(Net: Network; S_Init: Solution)

  returns (S_Best: Solution) is

const

  Tabu_Tenure: Natural := 5;

var

  Tabu_Moves: Array_Of_Natural; {counters of forbidden moves}

  Max_Idle: Natural; {maximum number of idle iterations}

  Num_Idle: Natural; {number of idle iterations}

  S_Iter: Solution; {solution of the current iteration}

  S_Neig: Solution; {solution in the neighborhood}

  Node_Best: Natural; {node with the best move}

begin

  for Node := 1 to Number_Of_Nodes(Net) do {initialize the tabu moves}

  Tabu_Moves[Node] := 0;

  end for;

  Max_Idle := Maximum_Of_Nonimprovements(Number_Of_Nodes(Net));

  Num_Idle := 0;

  S_Iter := S_Init;

  S_Best := S_Init;

  while Num_Idle < Max_Idle do

  Explore_Neighborhood(Net, S_Iter, S_Best, Tabu_Moves, S_Neig, Node_Best);

  for Node := 1 to Number_Of_Nodes(Net) do {decrease the tabu moves}

    Tabu_Moves[Node] := Maximum(0, Tabu_Moves[Node]-1);

  end for;

  Tabu_Moves[Node_Best] := Tabu_Tenure;

  S_Iter := S_Neig;

  if Modularity(S_Neig) > Modularity(S_Best) then

   S_Best := S_Neig;

   Num_Idle := 0;

  else

   Num_Idle := Num_Idle + 1;

  end if;

  end while;

  return S_Best;

end Tabu_Modularity_Optimization;

procedure Explore_Neighborhood(Net: in Network; S_Iter, S_Best: in Solution;

  Tabu_Moves: in out Array_Of_Natural;

  S_Neig: out Solution; Node_Best: out Natural) is

var

  S_Move: Solution; {solution from the move of a node}

begin

  Node_Best := 0;

  for Node := 1 to Number_Of_Nodes(Net) do

  S_Move := Solution_From_Move(Net, S_Iter, Node);

  if Modularity(S_Move) > Modularity(S_Best) then

   Tabu_Moves[Node] := 0;

  end if;

  if Tabu_Moves[Node] = 0 and

    (Node_Best = 0 or else Modularity(S_Move) > Modularity(S_Neig)) then

   Node_Best := Node;

   S_Neig := S_Move;

  end if;

  end for;

end Explore_Neighborhood;

The main advantage of this algorithm is that it is a mixture of divisive and agglomerative processes, avoiding the drawbacks of each strategy. Moreover, the iterative process can start from any initial partition, which is adequate for the mesoscale determination, since the optimal partitions for nearby values of the resistance are frequently similar. In terms of computational cost, the tabu heuristic is equivalent to other stochastic optimization methods such as simulated annealing or genetic algorithms.

Footnotes

  • An interesting comparison between the R&B method and our method (when it was still a preprint in arXiv) was performed by Kumpula et al [32].

References

  • [1]
    Girvan M and Newman M E J 2002 Community structure in social and biological networks Proc. Natl Acad. Sci. USA 99 7821

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [2]
    Newman M E J 2004 Detecting community structure in networks Eur. Phys. J. B 38 321

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [3]
    Danon L, Díaz-Guilera A, Duch J and Arenas A 2005 Comparing community structure identification J. Stat. Mech. P09008

    IOPscienceGoogle Scholar Brought to you by University of Bath

  • [4]
    Guimerà R, Mossa S, Turtschi A and Amaral L A N 2005 The worldwide air transportation network: anomalous centrality, community structure, and cities' global roles Proc. Natl Acad. Sci. USA 102 7794

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [5]
    Flake G W, Lawrence S, Giles C L and Coetzee F M 2002 Self-organization and identification of web communities IEEE Comput. 35 66

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [6]
    Holme P, Huss M and Jeong H 2003 Subnetwork hierarchies of biochemical pathways Bioinformatics 19 532

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [7]
    Guimerà R and Amaral L A N 2005 Functional cartography of metabolic networks Nature 433 895

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [8]
    Palla G, Derényi I, Farkas I and Vicsek T 2005 Uncovering the overlapping community structure of complex networks in nature and society Nature 435 814

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [9]
    Gleiser P and Danon L 2003 Community structure in jazz Adv. Complex Syst. 6 565

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [10]
    Eriksen K, Simonsen I, Maslov S and Sneppen K 2003 Modularity and extreme edges of the Internet Phys. Rev. Lett. 90 148701

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [11]
    Adamic L A and Adar E 2003 Friends and neighbors on the web Social Netw. 25 211

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [12]
    Newman M E J and Girvan 2004 Finding and evaluating community structure in networks Phys. Rev. E 69 026113

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [13]
    Newman M E J 2004 Analysis of weighted networks Phys. Rev. E 70 056131

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [14]
    Bell E T 1934 Exponential numbers Am. Math. Mon. 41 411

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [15]
    Newman M E J 2004 Fast algorithm for detecting community structure in networks Phys. Rev. E 69 066133

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [16]
    Clauset A, Newman M E J and Moore C 2004 Finding community structure in very large networks Phys. Rev. E 70 066111

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [17]
    Guimerà R and Amaral L A N 2005 Cartography of complex networks: modules and universal roles J. Stat. Mech. P02001

    IOPscienceGoogle Scholar Brought to you by University of Bath

  • [18]
    Duch J and Arenas A 2005 Community identification using extremal optimization Phys. Rev. E 72 027104

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [19]
    Pujol J M, Béjar J and Delgado J 2006 Clustering algorithm for determining community structure in large networks Phys. Rev. E 74 016107

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [20]
    Newman M E J 2006 Modularity and community structure in networks Proc. Natl Acad. Sci. USA 103 8577

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [21]
    Fortunato S and Barthélemy M 2007 Resolution limit in community detection Proc. Natl Acad. Sci. USA 104 36

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [22]
    Kumpula J M, Saramaki J, Kaski K and Kertesz J 2007 Resolution limit in complex network community detection with Potts model approach Eur. Phys. J. B 56 41

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [23]
    Arenas A, Díaz-Guilera A and Perez-Vicente C J 2006 Synchronization reveals topological scales in complex networks Phys. Rev. Lett. 96 114102

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [24]
    Ravasz E and Barabasi A-L 2003 Hierarchical organization in complex networks Phys. Rev. E 67 026112

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [25]
    Zachary W W 1977 An information flow model for conflict and fission in small groups J. Anthropol. Res. 33 452

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [26]
    Lusseau D, Schneider K, Boisseau O J, Haase P, Slooten E and Dawson S M 2003 The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Can geographic isolation explain this unique trait? Behav. Ecol. Sociobiol. 54 396

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [27]
    Lusseau D and Newman M E J 2004 Identifying the role that animals play in their social networks Proc. R. Soc. Lond. B 271 S477

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [28]
    Newman M E J 2006 Finding community structure in networks using the eigenvectors of matrices Phys. Rev. E 74 036104

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [29]
    Arenas A, Díaz-Guilera A and Perez-Vicente C J 2006 Synchronization processes in complex networks Physica D 224 27

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [30]
    Reichardt J and Bornholdt S 2006 Statistical mechanics of community detection Phys. Rev. E 74 016110

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [31]
    Sales-Pardo M, Guimerà R, Moreira A and Amaral L A N 2007 Extracting the hierarchical organization of complex systems Proc. Natl Acad. Sci. USA 104 15224

    CrossrefPubMedGoogle Scholar Brought to you by University of Bath

  • [32]
    Kumpula J M, Saramaki J, Kaski K and Kertesz J 2007 Limited resolution and multiresolution methods in complex network community detection Fluctuation Noise Lett. 7 L209

    CrossrefGoogle Scholar Brought to you by University of Bath

  • [33]
    Arenas A, Duch J, Fernández A and Gómez S 2007 Size reduction of complex networks preserving modularity New J. Phys. 9 176

    IOPscienceGoogle Scholar Brought to you by University of Bath

  • [34]
    Glover F 1986 Future paths for integer programming and links to artificial intelligence Comput. Oper. Res. 5 533

    CrossrefGoogle Scholar Brought to you by University of Bath

Export references: BibTeX RIS

undefined