Community Detection via Maximization of Modularity and Its Variants

Publisher: IEEE

Abstract:
In this paper, we first discuss the definition of modularity (Q) used as a metric for community quality and then we review the modularity maximization approaches which were used for community detection in the last decade. Then, we discuss two opposite yet coexisting problems of modularity optimization; in some cases, it tends to favor small communities over large ones while in others, large communities over small ones (so-called the resolution limit problem). Next, we overview several community quality metrics proposed to solve the resolution limit problem and discuss Modularity Density (Q ds ), which simultaneously avoids the two problems of modularity. Finally, we introduce two novel fine-tuned community detection algorithms that iteratively attempt to improve the community quality measurements by splitting and merging the given network community structure. The first one, referred to as Fine-tuned , is based on modularity (Q), while the second is based on Modularity Density (Q ds ) and denoted as Fine-tuned . Then, we compare the greedy algorithm of modularity maximization (denoted as Greedy ), Fine-tuned , and Fine-tuned on four real networks, and also on the classical clique network and the LFR benchmark networks, each of which is instantiated by a wide range of parameters. The results indicate that Fine-tuned Q ds is the most effective among the three algorithms discussed. Moreover, we show that Fine-tuned Q ds can be applied to the communities detected by other algorithms to significantly improve their results.
Published in: IEEE Transactions on Computational Social Systems ( Volume: 1, Issue: 1, March 2014)
Page(s): 46 - 65
Date of Publication: 09 April 2014
ISSN Information:
INSPEC Accession Number: 14416087
Publisher: IEEE

SECTION I.

Introduction

Many networks, including Internet, citation networks, transportation networks, email networks, and social and biochemical networks, display community structure which consists of groups of nodes within which connections are denser than between them [1]. Detecting and characterizing such community structure, which is known as community detection, is one of the fundamental issues in the study of network systems. Community detection has been shown to reveal latent yet meaningful structure in networks such as groups in online and contact-based social networks, functional modules in protein–protein interaction networks, groups of customers with similar interests in online retailer user networks, groups of scientists in interdisciplinary collaboration networks, etc. [2].

In the last decade, the most popular community detection methods have been to maximize the quality metric known as modularity [1], [3][4][5] over all possible partitions of a network. Such modularity optimization algorithms include greedy algorithms [6][7][8][9], spectral methods [3], [10][11][12][13][14][15], extremal optimization (EO) [16], simulated annealing (SA) [17][18][19][20], sampling technique [21], and mathematical programming [22]. Modularity measures the difference between the actual fraction of edges within the community and such fraction expected in a randomized graph with the same number of nodes and the same degree sequence. It is widely used as a measurement of strength of the community structures detected by the community detection algorithms. However, modularity maximization has two opposite yet coexisting problems. In some cases, it tends to split large communities into two or more small communities [23], [24]. In other cases, it tends to form large communities by merging communities that are smaller than a certain threshold which depends on the total number of edges in the network and on the degree of inter-connectivity between the communities. The latter problem is also known as the resolution limit problem [23][24][25].

To solve these two issues of modularity, several community quality metrics were introduced, including Modularity Density Qds [23], [24] which simultaneously avoids both of them. We then propose two novel fine-tuned community detection algorithms that repeatedly attempt to improve the quality measurements by splitting and merging the given community structure. We denote the corresponding algorithm based on modularity (Q) as Fine-tuned Q, while the one based on Modularity Density (Qds) is referred to as Fine-tuned Qds. Finally, we evaluate the greedy algorithm of modularity maximization (denoted as Greedy Q), Fine-tuned Q, and Fine-tuned Qds by using seven community quality metrics based on ground truth communities. These evaluations are conducted on four real networks, and also on the classical clique network and the LFR benchmark networks, each of which is instantiated by a wide range of parameters. The results indicate that Fine-tuned Qds is the most effective method and can also dramatically improve the community detection results of other algorithms. Further, all seven quality measurements based on ground truth communities are consistent with Qds, but not consistent with Q, which implies the superiority of Modularity Density over the original modularity.

SECTION II.

Review of Modularity Related Literature

In this section, we first review the definition of modularity and the corresponding optimization approaches. Then, we discuss the two opposite yet coexisting problems of modularity maximization. Finally, we overview several community quality measurements proposed to solve the resolution limit problem and then discuss Modularity Density (Qds) [23], [24] which simultaneously avoids these two problems.

A. Definition of Modularity

Comparing results of different network partitioning algorithms can be challenging, especially when network structure is not known beforehand. A concept of modularity defined in [1] provides a measure of the quality of a particular partitioning of a network. Modularity (Q) quantifies the community strength by comparing the fraction of edges within the community with such fraction when random connections between the nodes are made. The justification is that a community should have more links between themselves than a random gathering of people. Thus, the Q value close to 0 means that the fraction of edges inside communities is no better than the random case, and the value of 1 means that a network community structure has the highest possible strength.

Formally, modularity (Q) can be defined as [1]

Q=ciC[|Einci||E|(2|Einci|+|Eoutci|2|E|)2](1)
View Sourcewhere C is the set of all the communities, ci is a specific community in C, |Einci| is the number of edges between nodes within community ci, |Eoutci| is the number of edges from the nodes in community ci to the nodes outside ci, and |E| is the total number of edges in the network.

Modularity can also be expressed in the following form [3]:

Q=12|E|ij[Aijkikj2|E|]δci,cj(2)
View Sourcewhere ki is the degree of node i, Aij is an element of the adjacency matrix, δci,cj is the Kronecker delta symbol, and ci is the label of the community to which node i is assigned.

Since larger Q means a stronger community structure, several algorithms, which we will discuss in Section II-B, are based on modularity optimization.

The modularity measure defined above is suitable only for undirected and unweighted networks. However, this definition can be naturally extended to apply to directed networks as well as to weighted networks. Weighted and directed networks contain more information than undirected and unweighted ones. While this additional information makes such networks more valuable, it also makes them more difficult to analyze.

The revised definition of modularity that works for directed networks is as follows [4]:

Q=1|E|ij[Aijkinikoutj|E|]δci,cj(3)
View Sourcewhere kini and koutj are the in- and out-degrees.

Although many networks can be regarded as binary, i.e., as either having an edge between a pair of nodes or not having it, there are many other networks for which it would be natural to treat edges as having a certain degree of strength or weight.

The same general techniques that have been developed for unweighted networks are applied to its weighted counterparts in [5] by mapping weighted networks onto multigraphs. For non-negative integer weights, an edge with weight w in a weighted graph corresponds to w parallel edges in a corresponding multigraph. Although negative weights can arise in some applications, they are rarely useful in social networks, so for the sake of brevity we will not discuss them here. It turns out that an adjacency matrix of a weighted graph is equivalent to that of a multigraph with unweighted edges. Since the structure of adjacency matrix is independent of the edge weights, it is possible to adjust all the methods developed for unweighted networks to the weighted ones.

It is necessary to point out that the notion of degree of a node should also be extended for the weighted graphs. In this case, degree of a node is defined as the sum of weights of all edges incident to this node.

It is shown in [5] that the same definitions of modularity that were given above hold for the weighted networks as well, if we treat Aij as the value that represents weight of the connection and set |E|=12ijAij.

B. Modularity Optimization Approaches

In the literature, a high value of modularity (Q) indicates a good community structure and the partition corresponding to the maximum value of modularity on a given graph is supposed to have the highest quality, or at least a very good one. Therefore, it is natural to discover communities by maximizing modularity over all possible partitions of a network. However, it is computationally prohibitively expensive to exhaustively search all such partitions for the optimal value of modularity since modularity optimization is known to be NP-hard (i.e., it belongs to non-deterministic polynomial-time hard class of problems) [26]. However, many heuristic methods were introduced to find high-modularity partitions in a reasonable time. Those approaches include greedy algorithms [6][7][8][9], spectral methods [3], [10][11][12][13][14][15], EO [16], SA [17][18][19][20], sampling technique [21], and mathematical programming [22]. In this section, we will review those modularity optimization heuristics.

1. Greedy Algorithms

The first greedy algorithm was proposed by Newman [6]. It is a agglomerative hierarchical clustering method. Initially, every node belongs to its own community, creating altogether |V| communities. Then, at each step, the algorithm repeatedly merges pairs of communities together and chooses the merger for which the resulting modularity is the largest. The change in Q upon joining two communities ci and cj is

ΔQci,cj=2(|Eci,cj|2|E||Eci||Ecj|4|E|2),(4)
View Sourcewhere |Eci,cj| is the number of edges from community ci to community cj and |Eci|=2|Einci|+|Eoutci| is the total degrees of nodes in community ci. ΔQci,cj can be calculated in constant time. The algorithm stops when all the nodes in the network are in a single community after (|V|1) steps of merging. Then, there are totally |V| partitions, the first one defined by the initial step and each subsequent one resulting from each of the subsequent (|V|1) merging steps. The partition with the largest value of modularity, approximating the modularity maximum best, is the result of the algorithm. At each merging step, the algorithm needs to compute the change ΔQci,cj of modularity resulting from joining any two currently existing communities ci and cj in order to choose the best merger. Since merging two disconnected communities will not increase the value of modularity, the algorithm checks only the merging of connected pairs of communities and the number of such pairs is at most |E| limiting the complexity of this part to O(|E|). However, the rows and columns of adjacent matrix corresponding to the two merged communities must be updated, which takes O(|V|). Since there are (|V|1) iterations, the final complexity of the algorithm is O((|E|+|V|)|V|), or O(|V|2) for sparse networks.

Although Newman's algorithm [6] is much faster than the algorithm of Newman and Girvan [1] whose complexity is O(|E|2|V|), Clauset et al. [7] pointed out that the update of the adjacent matrix at each step contains a large number of unnecessary operations when the network is sparse and therefore its matrix has a lot of zero entries. They introduced data structures for sparse matrices to perform the updating operation more efficiently. In their algorithm, instead of maintaining the adjacent matrix and computing ΔQci,cj, they maintained and updated the matrix with entries being ΔQci,cj for the pairs of connected communities ci and cj. The authors introduced three data structures to represent sparse matrices efficiently: 1) each row of the matrix is stored as a balanced binary tree in order to search and insert elements in O(log|V|) time and also as a max-heap so as to locate the largest element of each row in constant time; 2) another max-heap stores the largest element of each row of the matrix so as to locate the largest ΔQci,cj in constant time; 3) a vector is used to save |Eci| for each community ci. Then, in each step, the largest ΔQci,cj can be found in constant time and the update of the adjacent matrix after merging two communities ci and cj takes O((kci+kcj)log|V|), where kci and kcj are the numbers of neighboring communities of communities ci and cj, respectively. Thus, the total running time is at most O(log|V|) times the sum of the degrees of nodes in the communities along the dendrogram created by merging steps. This sum is in the worst case the depth of the dendrogram times the sum of the degrees of nodes in the network. Suppose the dendrogram has depth d, then the running time is O(d|E|log|V|) or O(|V|log2|V|), when the network is sparse and the dendrogram is almost balanced (dlog|V|).

However, Wakita and Tsurumi [8] observed that the greedy algorithm proposed by Clauset et al. is not scalable to networks with sizes larger than 500 000 nodes. They found that the computational inefficiency arises from merging communities in an unbalanced manner, which yields very unbalanced dendrograms. In such cases, the relation dlog|V| does not hold any more, causing the algorithm to run at its worst-case complexity. To balance the merging of communities, the authors introduced three types of consolidation ratios to measure the balance of the community pairs and used it with modularity to perform the joining process of communities without bias. This modification enables the algorithm to scale to networks with sizes up to 10 000 000. It also approximates the modularity maximum better than the original algorithm.

Another type of greedy modularity optimization algorithm different from those above was proposed by Blondel et al., and it is usually referred to as Louvain [9]. It is divided into two phases that are repeated iteratively. Initially, every node belongs to the community of itself, so there are |V| communities. In this first phase, every node, in a certain order, is considered for merging into its neighboring communities and the merger with the largest positive gain is selected. If all possible gains associated with the merging of this node are negative, then it stays in its original community. This merging procedure repeats iteratively and stops when no increase of Q can be achieved.

After the first phase, Louvain reaches a local maximum of Q. Then, the second phase of Louvain builds a community network based on the communities discovered in the first phase. The nodes in the new network are the communities from the first phase and there is a edge between two new nodes if there are edges between nodes in the corresponding two communities. The weights of those edges are the sum of the weights of the edges between nodes in the corresponding two communities. The edges between nodes of the same community of the first phase result in a self-loop for this community node in the new network. After the community network is generated, the algorithm applies the first phase again on this new network. The two phases repeat iteratively and stop when there is no more change and consequently a maximum modularity is obtained. The number of iterations of this algorithm is usually very small and most of computational time is spent in the first iteration. Thus, the complexity of the algorithm grows like O(|E|). Consequently, it is scalable to large networks with the number of nodes up to a billion. However, the results of Louvain are impacted by the order in which the nodes in the first phase are considered for merging [27].

2. Spectral Methods

There are two categories of spectral algorithms for maximizing modularity: 1) based on the modularity matrix [3], [10], [11]; and 2) based on the Laplacian matrix of a network [12][13][14].

a) Modularity optimization using the eigenvalues and eigenvectors of the modularity matrix [3], [10], [11]

Modularity (Q) can be expressed as [3]

Q=14|E|ij(Aijkikj2|E|)(sisj+1)=14|E|ij(Aijkikj2|E|)sisj=14|E|sTBs(5)
View Sourcewhere Aij are the elements of adjacent matrix A and s is the column vector representing any division of the network into two groups. Its elements are defined as si=+1 if node i belongs to the first group and si=1 if it belongs to the second group. B is the modularity matrix with elements
Bij=Aijkikj2|E|.(6)
View Source

Representing s as a linear combination of the normalized eigenvectors ui of B: s=|V|i=1aiui with ai=uiTs, and then plugging the result into (5) yields

Q=14|E|iaiuiTBjajuj=14|E|ia2iβi(7)
View Sourcewhere βi is the eigenvalue of B corresponding to eigenvector ui. To maximize Q above, Newman [3] proposed a spectral approach to choose s proportional to the leading eigenvector u1 corresponding to the largest (most positive) eigenvalue β1. The choice assumes that the eigenvalues are labeled in decreasing order β1β2β|V|. Nodes are then divided into two communities according to the signs of the elements in s with nodes corresponding to positive elements in s assigned to one group and all remaining nodes to another. Since the row and column sums of B are zero, it always has an eigenvector (1,1,1,) with eigenvalue zero. Therefore, if it has no positive eigenvalue, then the leading eigenvector is (1,1,1,), which means that the network is indivisible. Moreover, Newman [3] proposed to divide network into more than two communities by repeatedly dividing each of the communities obtained so far into two until the additional contribution ΔQ to the modularity made by the subdivision of a community c
ΔQ=12|E|[12i,jcBij(sisj+1)i,jcBij]=14|E|sTB(c)s(8)
View Source
is equal to or less than 0. B(c) in the formula above is the generalized modularity matrix. Its elements, indexed by the labels i and j of nodes within community c, are
B(c)ij=BijδijkcBik.(9)
View Source

Then, the same spectral method can be applied to B(c) to maximize ΔQ. The recursive subdivision process stops when ΔQ0, which means that there is no positive eigenvalue of the matrix B(c). The overall complexity of this algorithm is O((|E|+|V|)|V|).

However, the spectral algorithm described above has two drawbacks. First, it divides a network into more than two communities by repeated division instead of getting all the communities directly in a single step. Second, it only uses the leading eigenvector of the modularity matrix and ignores all the others, losing all the useful information contained in those eigenvectors. Newman later proposed to divide a network into a set of communities C with |C|2 directly using multiple leading eigenvectors [10]. Let S=(sc) be an |V|×|C| “community-assignment” matrix with one column for each community c defined as

Si,c={1,0,if node i belongs to community cotherwise(10)
View Sourcethen the modularity (Q) for this direct division of the network is given by
Q=12|E|i,j=1|V|cCBijSi,cSj,c=12|E|Tr(STBS)(11)
View Source
where Tr(STBS) is the trace of matrix STBS. Defining B=UΣUT, where U=(u1,u2,) is the matrix of eigenvectors of B and Σ is the diagonal matrix of eigenvalues Σii=βi, yields
Q=12|E|i=1|V|cCβi(uiTsc)2.(12)
View Source

Then, obtaining |C| communities is equivalent to selecting |C|1 independent, mutually orthogonal columns sc. Moreover, Q would be maximized by choosing the columns sc proportional to the leading eigenvectors of B. However, only the eigenvectors corresponding to the positive eigenvalues will contribute positively to the modularity. Thus, the number of positive eigenvalues, plus 1, is the upper bound of |C|. More general modularity maximization is to keep the leading p(1p|V|) eigenvectors. Q can be rewritten as

Q=12|E|(|V|α+Tr[STU(ΣαI)UTS])=12|E||V|α+j=1|V|cC(βjα)[i=1|V|UijSi,c]2(13)
View Sourcewhere α(αβp) is a constant related to the approximation for Q obtained by only adopting the first p leading eigenvectors. By selecting |V| node vectors ri of dimension p whose jth component is
[ri]j=βjαUij(14)
View Source
modularity can be approximated as
QQ~=12|E|(|V|α+cC|Rc|2)(15)
View Source
where Rc, cC, are the community vectors
Rc=icri.(16)
View Source

Thus, the community detection problem is equivalent to choosing such a division of nodes into |C| groups that maximizes the magnitudes of the community vectors Rc while requiring that Rcri>0 if node i is assigned to community c. Problems of this type are called vector partitioning problems.

Although [10] explored using multiple leading eigenvectors of the modularity matrix, it did not pursue it in detail beyond a two-eigenvector approach for bipartitioning [3], [10]. Richardson et al. [11] provided a extension of these recursive bipartitioning methods by considering the best two-way or three-way division at each recursive step to more thoroughly explore the promising partitions. To reduce the number of partitions considered for the eigenvector-pair tripartitioning, the authors adopted a divide-and-conquer method and as a result yielded an efficient approach whose computational complexity is competitive with the two-eigenvector bipartitioning method.

b) Modularity optimization using the eigenvalues and eigenvectors of the Laplacian matrix [12][13][14]

Given a partition C (a set of communities) and the corresponding “community-assignment” matrix S=(sc), White and Smyth [12] rewrote modularity (Q) as follows:

QTr(ST(WD~)S)=Tr(STLQS)(17)
View Sourcewhere W=2|E|A and the elements of D~ are D~ij=kikj. The matrix LQ=D~W is called the “Q-Laplacian.” Finding the “community-assignment” matrix S that maximizes Q above is NP-complete, but a good approximation can be obtained by relaxing the discreteness constraints of the elements of S and allowing them to assume real values. Then, Q becomes a continuous function of S and its extremes can be found by equating its first derivative with respect to S to zero. This leads to the eigenvalue equation
LQS=SΛ(18)
View Source
where Λ is the diagonal matrix of Lagrangian multipliers. Thus, the modularity optimization problem is transformed into the standard spectral graph partitioning problem. When the network is not too small, LQ can be approximated well, up to constant factors, by the transition matrix W~=D1A obtained by normalizing A so that all rows sum to one. Here, D is the diagonal degree matrix of A. It can be shown that the eigenvalues and eigenvectors of W~ are precisely 1λ and μ, where λ and μ are the solutions to the generalized eigenvalue problem Lμ=λDμ, where L=DA is the Laplacian matrix. Thus, the underlying spectral algorithm here is equivalent to the standard spectral graph partitioning problem which uses the eigenvalues and eigenvectors of the Laplacian matrix.

Based on the above analysis, White and Smyth proposed two clustering algorithms, named “Algorithm Spectral-1” and “Algorithm Spectral-2,” to search for a partition C with size up to K predefined by an input parameter. Both algorithms take the eigenvector matrix UK=(u1,u2,,uK1) with the leading K1 eigenvectors (excluding the trivial all-ones eigenvector) of the transition matrix W~ as input. Those K1 eigenvectors can be efficiently computed with the implicitly restarted Lanczos method (IRLM) [28]. “Algorithm Spectral-1” uses the first k1 (2kK) columns of Uk, denoted as Uk1, and clusters the row vectors of Uk1 using k-means to find a k-way partition, denoted as Ck. Then, the Ck with size k that achieves the largest value of Q is the final community structure.

“Algorithm Spectral-2” starts with a single community (k=1) and recursively splits each community c into two smaller ones if the subdivision produces a higher value of Q. The split is done by running k-means with two clusters on the matrix Uk,c formed from Uk by keeping only rows corresponding to nodes in c. The recursive procedure stops when no more splits are possible or when k=K communities have been found and then the final community structure with the highest value of Q is the detection result.

However, the two algorithms described above, especially “Algorithm Spectral-1,” scale poorly to large networks because of running k-means partitioning up to K times. Both approaches have a worst-case complexity O(K2|V|+K|E|). In order to speed up the calculation while retaining effectiveness in approximating the maximum of Q, Ruan and Zhang [13] proposed the Kcut algorithm which recursively partitions the network to optimize Q. At each recursive step, Kcut adopts a k-way partition (k=2,3,,l) to the subnetwork induced by the nodes and edges in each community using “Algorithm Spectral-1” of White and Smyth [12]. Then, it selects the k that achieves the highest Q. Empirically, Kcut with l as small as 3 or 4 can significantly improve Q over the standard bipartitioning method and it also reduces the computational cost to O((|V|+|E|)log|C|) for a final partition with |C| communities.

Ruan and Zhang later [14] proposed QCUT algorithm whose name is derived from modularity (Q) partitioning (CUT), that combines Kcut and local search to optimize Q. The QCUT algorithm consists of two alternating stages: 1) partitioning and 2) refinement. In the partitioning stage, Kcut is used to recursively partition the network until Q cannot be further improved. In the refinement stage, a local search strategy repeatedly considers two operations. The first one is migration that moves a node from its current community to another one and the second one is the merge of two communities into one. Both are applied to improve Q as much as possible. The partitioning stage and refinement stage are alternating until Q cannot be increased further. In order to solve the resolution limit problem of modularity, the authors proposed stading for hierarchical QCUT (HQCUT) which recursively applies QCUT to divide the subnetwork, generated with the nodes and edges in each community, into subcommunities. Further, to avoid overpartitioning, they use a statistical test to determine whether a community indeed has intrinsic subcommunities.

c) Equivalence of two categories of spectral algorithms for maximizing modularity [15]

Newman [15] showed that with hyperellipsoid relaxation, the spectral modularity maximization method using the eigenvalues and eigenvectors of the modularity matrix can be formulated as the spectral algorithm that relies on the eigenvalues and eigenvectors of Laplacian matrix. This formulation indicates that the above two kinds of modularity optimization approaches are equivalent. Starting with (5) for the division of a network into two groups, first the discreteness of si is relaxed onto a hyperellipsoid with the constraint

ikis2i=2|E|.(19)
View Source

Then, the relaxed modularity maximization problem can be easily solved by setting the first derivative of (5) with respect to si to zero. This leads to

jBijsj=λkisi(20)
View Sourceor in matrix notation
Bs=λDs(21)
View Source
where λ is the eigenvalue. Plugging (20) into (5) yields
Q=14|E|ijBijsisj=λ4|E|ikis2i=λ2.(22)
View Source

Therefore, to achieve the highest value of Q, one should chose λ to be the largest (most positive) eigenvalue of (21). Using (6), (20) can be rewritten as

jAijsj=ki(λsi+12|E|jkjsj)(23)
View Sourceor in matrix notion as
As=D(λs+kTs2|E|1)(24)
View Source
where k is the vector with element ki and 1=(1,1,1,). Then, multiplying (24) by 1T results in λkTs=0. If there is a nontrivial eigenvalue λ>0, then the above equation simplifies to
As=λDs.(25)
View Source

Again, λ should be the most positive eigenvalue. However, the eigenvector corresponding to this eigenvalue is the uniform vector 1 which fails to satisfy kTs=0. Thus, in this case, one can do the best by choosing λ to be the second largest eigenvalue and having s proportional to the corresponding eigenvector. In fact, this eigenvector is precisely equal to the leading eigenvector of (21). Then, after defining a rescaled vector u=D1/2s and plugging it into (25), we get

(D1/2AD1/2)u=λu.(26)
View Source

The matrix L=D1/2AD1/2 is called the normalized Laplacian matrix. (The normalized Laplacian is sometimes defined as L=ID1/2AD1/2, but those two differ only by a trivial transformation of their eigenvalues and eigenvectors.)

3. Extremal Optimization

Duch and Arenas [16] proposed a modularity optimization algorithm based on the EO [29]. EO optimizes a global variable by improving extremal local variables. Here, the global variable is modularity (Q). The contribution of an individual node i to Q of the whole network with a certain community structure is given by

qi=ki,cki|Ec|2|E|(27)
View Sourcewhere ki,c is the number of edges that connect node i to the nodes in its own community c. Note that Q=12|E|iqi and qi can be normalized into the interval [1,1] by dividing it by ki
λi=qiki=ki,cki|Ec|2|E|(28)
View Source
where λi, called fitness, is the relative contribution of node i to Q. Then, the fitness of each node is adopted as the local variable.

The algorithm starts by randomly splitting the network into two partitions of equal number of nodes, where communities are the connected components in each partition. Then, at each iteration, it moves the node with the lowest fitness from its own community to another community. The shift changes the community structure, so the fitness of many other nodes needs to be recomputed. The process repeats until it cannot increase Q. After that, it generates sub-community networks by deleting the inter-community edges and proceeds recursively on each sub-community network until Q cannot be improved. Although the procedure is deterministic when given the initialization, its final result in fact depends on the initialization and it is likely to get trapped in local maxima. Thus, a probabilistic selection called τ-EO [29] in which nodes are ranked according to their fitness and a node of rank r is selected with the probability P(r)rτ is used to improve the result. The computational complexity of this algorithm is O(|V|2log2|V|).

4. Simulated Annealing

SA [30] is a probabilistic procedure for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. This technique was adopted in [17][18][19][20] to maximize modularity (Q). The initial point for all those approaches can be arbitrary partitioning of nodes into communities, even including |V| communities in which each node belongs to its own community. At each iteration, a node i and a community c are chosen randomly. This community could be a currently existing community or an empty community introduced to increase the number of communities. Then, node i is moved from its original community to this new community c, which would change Q by ΔQ. If ΔQ is greater than zero, this update is accepted, otherwise it is accepted with probability eβΔQ where β in [17][18][19] represents the inverse of temperature T and β in [20] is the reciprocal of pseudo temperature τ. In addition in [20], there is one more condition for the move of a node when c is not empty, shifting node i to c is considered only if there are some edges between node i and the nodes in c. To improve the performance and to avoid getting trapped in local minima, collective movements which involve moving multiple nodes at a time [19], [20], merging two communities [17][18][19], and splitting a community [17][18][19] are employed. Splits can be carried out in a number of different schemes. The best performance is achieved by treating a community as an isolated subnetwork and partitioning it into two and then performing a nested SA on these partitions [17], [18]. Those methods stop when no new update is accepted within a fixed number of iterations.

5. Sampling Techniques

Sales-Pardo et al. [21] proposed a “box-clustering” method to extract the hierarchical organization of networks. This approach consists of two steps: 1) estimating the similarity, called “node affinity,” between nodes and forming the node affinity matrix; and 2) deriving hierarchical community structure from the affinity matrix. The affinity between two nodes is the probability that they are classified into the same community in the local maxima partitions of modularity. The set of local maxima partitions, called Pmax, includes those partitions for which neither the moving of a node from its original community to another, nor the merging of two communities will increase the value of modularity. The sample Pmax is found by performing the SA-based modularity optimization algorithm of Guimerá and Amaral [17], [18]. More specifically, the algorithm first randomly divides the nodes into communities and then performs the hill-climbing search until a sample with local maximum of modularity is reached. Then, the affinity matrix is updated based on the obtained sample.

The sample generation procedure is repeated until the affinity matrix has converged to its asymptotic value. Empirically, the total number of samples needed is proportional to the size of the network. Before proceeding to the second step, the algorithm assesses whether the network has a significant community structure or not. It is done by computing the z-score of the average modularity of the partitions in Pmax with respect to the average modularity of the partitions with the local modularity maxima of the equivalent ensemble of null model networks. The equivalent null model is obtained by randomly rewiring the edges of the original network while retaining the degree sequence. Large z-score indicates that the network has a meaningful internal community structure. If the network indeed has a significant community structure, the algorithm advances to the second step to group nodes with large affinity close to each other. The goal is to bring the form of the affinity matrix as close as possible to block-diagonal structure by minimizing the cost function representing the average distance of matrix elements to the diagonal. Then, the communities corresponds to the “best” set of boxes obtained by least-squares fitting of the block-diagonal structure to the affinity matrix. The procedure described above can be recursively performed to subnetworks induced by communities to identify the low level structure of each community until no subnetwork is found to have significant intrinsic structure.

6. Mathematical Programming

Agarwal and Kempe [22] formulated the modularity maximization problem as a linear program and vector program which has the advantage of providing a posteriori performance guarantees. First, modularity maximization can be transformed into the integer program

Maximize12|E|ijBij(1xij)subject to xikxij+xjk,for all i,j,kxij{0,1},for all i,j(29)
View Sourcewhere B is the modularity matrix and the objective function is linear in the variable xij. When xij=0, i and j belong to the same community and xij=1 indicates that they are in different communities. The restriction xikxij+xjk requires that i and k are in the same community if and only if i, j, and k are in the same community. Solving the above integer program is NP-hard, but relaxing the last constraint that xij is a integer from {0,1} to allow xij be a real number in the interval [0, 1] reduces the integer program to a linear program which can be solved in polynomial time [31]. However, the solution does not correspond to a partition when any of xij is fractional. To get the communities from xij, a rounding step is needed. The value of xij is treated as the distance between i and j and these distances are used repeatedly to form communities of “nearby” nodes. Moreover, optimizing modularity by dividing a network into two communities can be considered as a strict quadratic program
Maximize14|E|ijBij(1+sisj)subject to s2i=1,for all i(30)
View Source
where the objective function is the same as (5) defined by Newman [3]. Note that the constraint s2i=1 ensures that si=±1 which implies that node i belongs either to the first or the second community. Quadratic programming is NP-complete, but it could be relaxed to a vector program by replacing each variable si with |V|-dimensional vector s and replacing the scalar product with the inner vector product. The solution to vector program is one location per node on the surface of a |V|-dimensional hypersphere. To obtain a bipartition from these node locations, a rounding step is needed which chooses any random (|V|1)-dimensional hyperplane passing through the origin and uses this hyperplane to cut the hypersphere into two halves and as a result, separate the node vectors into two parts. Multiple random hyperplanes can be chosen and the one that gets the community structure with the highest modularity provides a solution. The same vector program is then recursively applied to subnetworks generated with nodes and edges in discovered communities to get hierarchical communities until Q cannot be increased. Following the linear program and vector program, Agarwal and Kempe also adopted a post-processing step similar to the local search strategy proposed by Newman [3] to further improve the results.

C. Resolution Limit

Since its inception, the modularity has been used extensively as the measure of the quality of partitions produced by community detection algorithms. In fact, if we adopt modularity as a quality measure of communities, the task of discovering communities is essentially turned into the task of finding the network partitioning with an optimal value of modularity.

However as properties of the modularity were studied, it was discovered that in some cases it fails to detect small communities. There is a certain threshold [25], such that a community of the size below it will not be detected even if it is a complete subgraph connected to the rest of the graph with a single edge. This property of modularity has been known as the resolution limit.

Although the resolution limit prevents detection of small communities, the actual value of the threshold depends on the total number of edges in the network and on the degree of interconnectedness between communities. In fact, the resolution limit can reach the values comparable to the size of the entire network causing formation of a few giant communities (or even a single community) and failing to detect smaller communities within them. It makes interpreting the results of community detection very difficult because it is impossible to tell beforehand whether a community is well-formed or if it can be further split into subcommunities.

Considering modularity as a function of the total number of edges |E| and the number of communities m makes it possible to find the values of m and |E| which maximize this function. It turns out that setting m=|E| yields the absolute maximal value of modularity. Consequently, modularity has a resolution limit of order |E| which bounds the number and size of communities [25]. In fact, if for a certain community, the number of edges inside it is smaller than |E|2, such community cannot be resolved through the modularity optimization. It is also possible for modularity optimization to fail to detect communities of larger size if they have more edges in common with the rest of the network. Therefore, by finding the optimal value of the modularity, we are generally not obtaining the best possible structure of communities.

The above arguments can also be applied to weighted networks. In this case, |E| is the sum of the weights of all the edges in the network, |Einci| is the sum of the weights of the edges between nodes within community ci, and |Eoutci| is the sum of the weights of the edges from the nodes in community ci to the nodes outside ci.

By introducing an additional parameter ϵ, which represents the weight of inter-community edges, Berry et al. showed in [32] that the number of communities in the optimal solution is

m=|E|ϵ.(31)
View Source

Correspondingly, any community for which its size

|ci|<|E|ϵ2ϵ(32)
View Sourcemay not be resolved.

Introduction of ϵ brings some interesting opportunities. If we can make ϵ arbitrarily small, then we can expect maximum weighted modularity to produce any desired number of communities. In other words, given a proper weighting, a much better modularity resolution can be achieved than without weighting. However, in practice, finding a way to set edge weights to achieve small values of ϵ can be challenging. An algorithm for lowering ϵ proposed by Berry et al. requires O(m|V|log|V|) time.

D. Resolving the Resolution Limit Problem

There have been extensive studies done on how to mitigate the consequences of the modularity resolution limit. The main approaches followed are described below.

Localized modularity measure (LQ) [33] is based on the observation that the resolution limit problem is caused by modularity being a global measure since it assumes that edges between any pairs of nodes are equally likely, including connectivity between the communities. However, in many networks, the majority of communities have edges to only a few other communities, i.e., exhibit a local community connectivity.

Thus, a local version of the modularity measure for a directed network is defined as

LQ=ciC|Einci||Eneighbci|(|Einci|+|Eoutci||Eneighbci|)2(33)
View Sourcewhere |Eneighbci| is the total number of edges in the neighboring communities of ci, i.e., in the communities to which all neighbors of ci belong.

Unlike traditional modularity (Q), the local version of modularity (LQ) is not bounded above by 1. The more locally connected communities a network has, the bigger its LQ can grow. In a network, where all communities are connected to each other, LQ yields the same value as Q. LQ considers individual communities and their neighbors, and therefore provides a measure of community quality that is not dependent on other parts of the network. The local connectivity approach can be applied not only to the nearest neighboring communities, but also to the second or higher neighbors as well.

Arenas et al. proposed a multiple resolution method [34] which is based on the idea that it might be possible to look at the detected community structure at different scales. From this perspective, the modularity resolution limit is not a problem but a feature. It allows choosing a desired resolution level to achieve the required granularity of the output community structure using the original definition of modularity.

The multiple resolution method is based on the definition of modularity given by (1). The modularity resolution limit depends on the total weight 2|E|. By varying the total weight, it is possible to control the resolution limit, effectively performing community detection at different granularity levels. Changing the sum of weights of edges adjacent to every node by some value r results in rescaling topology by a factor of r. Since the resolution limit is proportional to r, the growth of the resolution limit is slower than that of r. Consequently, it would be possible to achieve a scale at which all required communities would be visible to the modularity optimization problem.

Caution should be exercised when altering the weights of edges in the network to avoid changing its topological characteristics. To ensure this, a rescaled adjacency matrix can be defined as

Ar=A+rI(34)
View Sourcewhere A is the adjacency matrix and I is the identity matrix. Since the original edge weights are not altered, Ar preserves all common features of the network: distribution of sum of weights, weighted clustering coefficient, eigenvectors, etc. Essentially, introducing r results in a self-loop of weight r being added to every node in the network.

Optimizing the modularity for the rescaled topology Ar is performed by using the modularity at scale r as the new quality function

Qr=ciC[2|Einci|+r|ci|2|E|+r|V|(|Eci|+r|ci|2|E|+r|V|)2](35)
View Sourcewhere |ci| is the number of nodes in community ci and |Eci|=2|Einci|+|Eoutci|. It yields larger communities for smaller values of r and smaller communities for larger values of r. By performing modularity optimization for different values of r, it is possible to analyze the community structure at different scales.

Parameter r can also be thought as representing resistance of a node to become part of a community. If r is positive, we can obtain a network community structure that is more granular than what was possible to achieve with the original definition of modularity (Q) which corresponds to r being zero. Making r negative zooms out of the network and provides a view of super communities.

Further studies of the multiple resolution approach revealed that it suffers from two major issues outlined in [35]. First, when the value of the resolution parameter r is low it tends to group together small communities. Second, when the resolution is high, it splits large communities. These trends are opposite for networks with a large variation of community sizes. Hence, it is impossible to select a value of the resolution parameter, such that neither smaller nor larger communities are adversely affected by the resolution limit. A network can be tested for susceptibility to the resolution problem by examining its clustering coefficient, i.e., a degree to which nodes tend to form communities. If the clustering coefficient has sharp changes, it indicates that communities of substantially different scales exist in this network. The result is that when the value of r is sufficiently large, bigger communities get broken up before smaller communities are found. This applies also to other multiple resolution methods and seems to be a general problem of the methods that are trying to optimize some global measure.

The hierarchical multiresolution method proposed by Granell et al. in [36] overcomes the limitations of the multiple resolution method on networks with very different scales of communities. It achieves that by introducing a new hierarchical multiresolution scheme that works even in cases of community detection near the modularity resolution limit. The main idea underlying this method is based on performing multiple resolution community detection on essential parts of the network, thus analyzing each part independently.

The method operates iteratively by first placing all nodes in a single community. Then, it finds the minimum value of the resistance parameter r which produces a community structure with the optimal value of modularity. Finally, it runs the same algorithm on each community that was found. The method terminates when no more split of communities is necessary, which usually takes just a few steps.

Another approach to leveraging the results of modularity optimization has been introduced by Chakraborty et al. in [27]. It is based on the observation that a simple change to the order of nodes in a network can significantly affect the community structure. However, a closer examination of the communities produced in different runs of a certain community detection algorithm reveals that for many networks the same invariant groups of nodes are consistently assigned to the same communities. Such groups of nodes are called constant communities. The percentage of constant communities varies depending on the network. Constant communities are detected by trying different node permutations while preserving the degree sequence of the nodes. For networks that have strong community structure, the constant communities detected can be adopted as a pre-processing step before performing modularity optimization. This can lead to higher modularity values and lower variability in results, thus improving the overall quality of community detection.

In the study [37] by Li et al., a new quantitative measure for community detection is introduced. It offers several improvements over the modularity (Q), including elimination of the resolution limit and ability to detect the number of communities. The new measure called modularity density (D) is based on the average degree of the community structure. It is given by

D=ciC2|Einci||Eoutci||ci|.(36)
View Source

The quality of the communities found is then described by the value of the modularity density (D). The larger the value of D, the stronger the community structure is.

The modularity density (D) does not divide a clique into two parts, and it can resolve most modular networks correctly. It can also detect communities of different sizes. This second property can be used to quantitatively determine the number of communities, since the maximum D value is achieved when the network is supposed to correctly partitioned. Although as mentioned in [37] finding an optimal value of modularity density (D) is NP-hard (i.e., it belongs to Non-deterministic Polynomial-time hard class of problems) but, it is equivalent to an objective function of the kernel k means clustering problem for which efficient computational algorithms are known.

Traag et al. in [38] introduce a rigorous definition of the resolution-limit-free method for which considering any induced subgraph of the original graph does not cause the detected community structure to change. In other words, if there is an optimal partitioning of a network (with respect to some objective function), and for each subgraph induced by the partitioning it is also optimal, then such objective function is called resolution-limit-free. An objective function is called additive for a certain partitioning if it is equal to the sum of the values of this objective function for each of the subgraphs induced by the partitioning.

Based on these two definitions, it is proved that if an objective function is additive and there are two optimal partitions, then any combination of these partitions is also optimal. In case of a complete graph, if an objective function is resolution-limit-free, then an optimal partitioning either contains all the nodes (i.e., there is only one community which includes all nodes) or consists of communities of size 1 (i.e., each node forms a community of its own). A more general statement for arbitrary objective functions is also true: if an objective function has local weights (i.e., weights that do not change when considering subgraphs), then it is resolution-limit-free. Although the converse is not true, there is only a relatively small number of special cases when methods with nonlocal weights are resolution-limit-free.

The authors then analyze resolution-limit-free within the framework of the first principle Potts model [39]

H=ij(aijAijbij(1Aij))δci,cj(37)
View Sourcewhere aij, bij0 are some weights. The intuition behind this formula is that a community should have more edges inside it than edges which connect it to other communities. Thus, it is necessary to reward existing links inside a community and penalize links that are missing from a community. The smaller the value of H is, the more desirable the community structure is. However, the minimal value might not be unique.

Given the definition of H, it is possible to describe various existing community detection methods with an appropriate choice of parameters, as well as propose alternative methods. The following models are shown to fit into H: Reichardt and Bornholdt (RB), Arenas, Fernándes, and Gómez (AFG), Ronhovde and Nussinov (RN) as well as the label propagation method. RB approach with a configuration null model also covers the original definition of modularity. The authors also propose a new method called constant Potts model (CPM) by choosing aij=wijbij and bij=γ, where wij is the weight of the edge between nodes i and j, and γ is a constant. CPM is similar to RB and RN models, but is simpler and more intuitive. CPM and RN have local weights and are consequently resolution-limit-free, while RB, AFG, and modularity are not.

However, all of the above approaches are aimed at solving only the resolution limit problem. Work done by Chen et al. in [23] and [24] adopts a different definition of modularity density which simultaneously addresses two problems of modularity. It is done by mixing two additional components, Split Penalty (SP) and the community density, into the well-known definition of modularity. Community density includes internal community density and pair-wise community density. SP is the fraction of edges that connect nodes of different communities

SP=ciCcjCcjci|Eci,cj|2|E|.(38)
View Source

The value of SP is subtracted from modularity, while the value of the community density is added to modularity and SP. Introducing SP resolves the issue of favoring small communities. Community density eliminates the problem of favoring large communities (also known as the resolution limit problem). The Modularity Density (Qds) is then given by

Qdsdcidci,cj=ciC[|Einci||E|dci(2|Einci|+|Eoutci|2|E|dci)2cjCcjci|Eci,cj|2|E|dci,cj=2|Einci||ci|(|ci|1)=|Eci,cj||ci||cj|.(39)
View Sourcewhere dci is the internal density of community ci and dci,cj is the pair-wise density between community ci and community cj.

Modularity Density (Qds) avoids falling into the trap of merging two or more consecutive cliques in the ring of cliques network or dividing a clique into two or more parts. It can also discover communities of different sizes. Thus, using Qds solves both the resolution limit problem of modularity and the problem of splitting larger communities into smaller ones. Hence, Qds is a very effective alternative to Q.

SECTION III.

Maximizing Modularity Density Qds

In our previous papers [23], [24], we have given the definition of Modularity Density (Qds). With formal proofs and experiments on two real dynamic datasets (Senate dataset [40] and Reality Mining Bluetooth Scan data [41]), we demonstrated that Qds solves the two opposite yet coexisting problems of modularity: 1) the problem of favoring small communities; and 2) the problem of favoring large communities (also called the resolution limit problem). Moreover, for a given community in Qds defined by (39), its internal and pair-wise densities and its split penalty are local components, which are related to the resolution-limit-free definition in [38]. Therefore, it is reasonable to expect that maximizing Qds would discover more meaningful community structure than maximizing Q. In this section, we first illustrate why the greedy agglomerative algorithm for increasing Qds cannot be adopted for optimizing Qds. Then, we propose a fine-tuned community detection algorithm that repeatedly attempts to improve the community quality measurements by splitting and merging the given network community structure to maximize Qds.

A. Greedy Algorithm Fails to Optimize Qds

In this section, we show why the greedy agglomerative algorithm increasing Qds fails to optimize it. At the first step of the greedy algorithm for increasing Qds, each node is treated as a single community. Then, Qds of each node or community is Qds=SP. Therefore, in order to increase Qds the most, the greedy algorithm would first merge the connected pair of nodes with the sum of their degrees being the largest among all connected pairs. However, it is very likely that those two nodes belong to two different communities, which would finally result in merging those two communities instead of keeping them separate. This will result in a much lower value of Qds for such a merged community compared to Qds for its components, demonstrating the reason for greedy Qds algorithm failure in optimizing Qds.

For example, in the network example in Fig. 1, the initial values of Qds for nodes 1,2,4,6,7, and 8 with degree 3 are Qds=SP=326, while the initial values of Qds for nodes 3 and 5 with degree 4 are Qds=SP=426. Then, greedy Qds algorithm would first merge nodes 3 and 5, which would finally lead to a single community of the whole eight nodes. However, the true community structure contains two clique communities. Accordingly, the Qds of the community structure with two clique communities 0.4183 is larger than that of the community structure with one single large community 0.2487. So, maximizing Qds properly should have the ability to discover the true community structure.

Fig. 1. A simple network with two clique communities. Each clique has four nodes and the two clique communities are connected to each other with one single edge.

B. Fine-Tuned Algorithm

In this part, we describe a fine-tuned community detection algorithm that iteratively improves a community quality metric M by splitting and merging the given network community structure. We denote the corresponding algorithm based on modularity (Q) as Fine-tuned Q and the one based on Modularity Density (Qds) as Fine-tuned Qds. It consists of two alternating stages: 1) split stage and 2) merging stage.

Algorithm 1 Split_Communities (G, C)

1: Initialize comWeights [|C|] [|C|], comEdges [|C|] [|C|], and comDensities [|C|] [|C|] which respectively contain #weights, #edges, and the density inside the communities and between two communities by using the network G and the community list C;

2: //Get the metric value for each community.

3: Mes[|C|]=GetMetric (C, comWeights, comDensities);

4: for i=0 to |C|1 do

5: c=C.get(i);

6: subnet=GenerateSubNetwork(c);

7: fiedlerVector[|c|]=LanczosMethod(subnet);

8: nodeIds[|c|]=sort(fiedlerVector, ‘descend’);

9: //Form |c|+1 divisions and record the best one.

10: splitTwoCom.addAll(nodeIds);

11: for j=0 to |c|1 do

12: splitOneCom.add(nodeIds [j]);

13: splitTwoCom.remove(nodeIds [j]);

14: Calculate M(split) for the split at j;

15: ΔM=M(split)Mes[i];

16: if ΔM(best)<ΔM (or ΔM(best)>ΔM) then

17: ΔM(best)=ΔM;

18: bestIdx=j;

19: end if

20: end for

21: if ΔM(best)>0 (or ΔM(best)<0) then

22: Clear splitOneCom and splitTwoCom;

23: splitOneCom.addAll(nodeIds [0:bestIdx]);

24: splitTwoCom.addAll(nodeIds [bestIdx+1:|c|1);

25: newC.add(splitOneCom);

26: newC.add(splitTwoCom);

27: else

28: newC.add(c);

29: end if

30: end for

31: return newC

In the split stage, the algorithm will split a community c into two subcommunities c1 and c2 based on the ratio-cut method if the split improves the value of the quality metric. The ratio-cut method [42] finds the bisection that minimizes the ratio |Ec1,c2||c1||c2|, where |Ec1,c2| is the cut size (namely, the number of edges between communities c1 and c2), while |c1| and |c2| are sizes of the two communities. This ratio penalizes situations in which either of the two communities is small and thus favors balanced divisions over unbalanced ones. However, graph partitioning based on the ratio-cut method is a NP-complete problem. Thus, we approximate it by using the Laplacian spectral bisection method for graph partitioning introduced by Fiedler [43], [44].

First, we calculate the Fiedler vector which is the eigenvector of the network Laplacian matrix L=DA corresponding to the second smallest eigenvalue. Then, we put the nodes corresponding to the positive values of the Fiedler vector into one group and the nodes corresponding to the negative values into the other group. The subnetwork of each community is generated with the nodes and edges in that community. Although the ratio-cut approximated with spectral bisection method does allow some deviation for the sizes |c1| and |c2| to vary around the middle value, the right partitioning may not actually divide the community into two balanced or nearly balanced ones. Thus, it is to some extent inappropriate and unrealistic for community detection problems. We overcome this problem by using the following strategies. First, we sort the elements of the Fiedler vector in descending order, then cut them into two communities in each of the |c|+1 possible ways and calculate the corresponding change of the metric values ΔM of all the |c|+1 divisions. Then, the one with the best value (largest or smallest depending on the measurement) of the quality metric ΔM(best) among all the |c|+1 divisions is recorded. We adopt this best division to the community c only when ΔM(best)>0 (or ΔM(best)<0 depending on the metric). For instance, we split the community only when ΔQds(best) is larger than zero.

The outline of the split stage is shown in Algorithm 1. The input is a network and a community list, and the output is a list of communities after splitting. The initialization part has O(|E|) complexity. Computing Fiedler vector using Lanczos method [28] needs O(|E|Kh+|V|K2h+K3h) steps, where K is the number of eigenvectors needed and h is the number of iterations required for the Lanczos method to converge. Here, K is 2 and h is typically very small although the exact number is not generally known. So, the complexity for calculating Fiedler vector is O(|E|+|V|). Sorting the Fiedler vector has the cost O(|V|log|V|). The search of the best division from all the |c|+1 possible ones (per community c) for all the communities is achieved in O(|E|) time. For the |c|+1 possible divisions of a community c, each one differs from the previous one by the movement of just a single node from one group to the other. Thus, the update of the total weights, the total number of edges, and the densities inside those two split communities and between those two communities to other communities can be calculated in time proportional to the degree of that node. Thus, all nodes can be moved in time proportional to the sum of their degrees which is equal to 2|E|. Moreover, for Fine-tuned Qds, computing Qds(split) costs O(|C||V|) because all the communities are traversed to update the SP for each of the |c|+1 divisions of each community c. All the other parts have complexity less than or at most O(|V|). Thus, the computational complexity for the split stage of Fine-tuned Q is O(|E|+|V|log|V|), while for Fine-tuned Qds it is O(|E|+|V|log|V|+|C||V|).

Algorithm 2 Merge_Communities (G, C)

1: Initialize comWeights [|C|] [|C|], comEdges [|C|] [|C|], and comDensities [|C|] [|C|];

2: //Get the metric value for each community.

3: Mes[|C|]=GetMetric(C,comWeights,comDensities);

4: for i=0 to |C|1 do

5: for j=i+1 to |C|1 do

6: //Doesn't consider disconnected communities.

7: if comWeights[i][j]==0 && comWeights[j][i]==0 then

8: continue;

9: end if

10: Calculate M(merge) for merging ci and cj;

11: ΔM=M(merge)Mes[i]Mes[j];

12: //Record the merging information with |ΔM| descending in a red-black tree

13: if ΔM>0 (or ΔM<0) then

14: mergedInfos.put([|ΔM|, i, j]);

15: end if

16: end for

17: end for

18: //Merge the community with the one that improves the value of the quality metric the most

19: while mergedInfos.hasNext() do

20: [ΔM,comId1,comId2]=mergedInfos.next();

21: if !mergedComs.containsKey(comId1) && !mergedComs.containsKey(comId2) then

22: mergedComs.put(comId1,comId2);

23: mergedComs.put(comId2,comId1);

24: end if

25: end while

26: for i=0 to |C|1 do

27: ci=C.get(i);

28: if mergedComs.containsKey(i) then

29: comId2=mergedComs.get(i);

30: if i<comId2 then

31: ci.addAll(C.get(comId2));

32: end if

33: end if

34: newC.add(ci);

35: end for

36: return newC;

In the merging stage, the algorithm will merge a community to its connected communities if the merging improves the value of the quality metric. If there are many mergers possible for a community, the one, unmerged so far, which improves the quality metric the most is chosen. Hence, each community will only be merged at most once in each stage. The outline of the merging stage is shown in Algorithm 2. The input is a network and a community list, and the output is a list of communities after merging. The initialization part has the complexity O(|E|). For Fine-tuned Q, the two “for loops” for merging any two communities have the complexity O(|C|2log|C|) because calculating Q(merge) is O(1) and inserting an element into the red-black tree is O(log|C|2)=O(2log|C|)O(log|C|) since the maximum number of elements in the tree is |C|(|C|1)2=O(|C|2). For Fine-tuned Qds, the two “for loops” for merging any two communities have the complexity O(|C|3) because calculating Qds(merge) needs O(|C|) steps to traverse all the communities to update the SP and inserting an element into the red-black tree is O(log|C|) as well. All other parts have complexity at most O(|C|2). Thus, the computational complexity for the merging stage of Fine-tuned Q is O(|E|+|C|2log|C|) and for the merging stage of Fine-tuned Qds is O(|E|+|C|3).

Algorithm 3 Fine-tuned_Algorithm (G, C)

1: comSize=|C|;

2: splitSize=0;

3: mergeSize=0;

4: while comSize!=splitSizecomSize!=mergeSize do

5: comSize=|C|;

6: C=Split_Communities(G,C);

7: splitSize=|C|;

8: C=Merge_Communities(G,C);

9: mergeSize=|C|;

10: end while

11: return C

The fine-tuned algorithm repeatedly carries out those two alternating stages until neither splitting nor merging can improve the value of the quality metric or until the total number of communities discovered does not change after one full iteration. Algorithm 3 shows the outline of the fine-tuned algorithm. It can detect the community structure of a network by taking a list with a single community of all the nodes in the network as the input. It can also improve the community detection results of other algorithms by taking a list with their communities as the input. Let the number of iteration of the fine-tuned algorithm be denoted as T. Then, the total complexity for Fine-tuned Q is O(T(|E|+|V|log|V|+|C|2log|C|)), while for Fine-tuned Qds it is O(T(|E|+|V|log|V|+|C||V|+|C|3)). Assuming that T and |C| are constants, the complexity of the fine-tuned algorithms reduces to O(|E|+|V|log|V|). The only part of the algorithm that would generate a nondeterministic result is the Lanczos method of calculating the Fiedler vector. The reason is that Lanczos method adopts a randomly generated vector as its starting vector. We solve this issue by choosing a normalized vector of the size equal to the number of nodes in the community as the starting vector for the Lanczos method. Then, community detection results will stay the same for different runs as long as the input remains the same.

SECTION IV.

Experimental Results

In this section, we first introduce several popular measurements for evaluating the quality of the results of community detection algorithms. Denoting the greedy algorithm of modularity maximization proposed by Newman [7] as Greedy Q, we then use the above-mentioned metrics to compare Greedy Q, Fine-tuned Q, and Fine-tuned Qds. The comparison uses four real networks, the classical clique network and the LFR benchmark networks, each instance of which is defined with parameters each selected from a wide range of possible values. The results indicate that Fine-tuned Qds is the most effective method among the three, followed by Fine-tuned Q. Moreover, we show that Fine-tuned Qds can be applied to significantly improve the detection results of other algorithms.

In Section II-B-2, we have shown that the modularity maximization approach using the eigenvectors of the Laplacian matrix is equivalent to the one using the eigenvectors of the modularity matrix. This implies that the split stage of our Fine-tuned Q is actually equivalent to the spectral methods. Therefore, Fine-tuned Q with one additional merge operation at each iteration unquestionably has better performance than the spectral algorithms. Hence, we do not discuss them here.

A. Evaluation Metrics

The quality evaluation metrics we consider here can be divided into three categories: 1) Variation of Information (VI) [45] and Normalized Mutual Information (NMI) [46] based on information theory; 2) F-measure [47] and Normalized Van Dongen metric (NVD) [48] based on cluster matching; and 3) Rand Index (RI) [49], Adjusted Rand Index (ARI) [50], and Jaccard Index (JI) [51] based on pair counting.

1. Information Theory-Based Metrics

Given partitions C and C, VI [45] quantifies the “distance” between those two partitions, while NMI [46] measures the similarity between partitions C and C. VI is defined as

VI(C,C)=H(C)+H(C)2I(C,C)=H(C,C)I(C,C)(40)
View Sourcewhere H(.) is the entropy function and I(C,C)=H(C)+H(C)H(C,C) is the Mutual Information. Then, NMI is given by
NMI(C,C)=2I(C,C)H(C)+H(C).(41)
View Source

Using the definitions

H(C)H(C,C)=ciCp(ci)logp(ci)=ciC|ci||V|log|ci||V|=ciC,cjCp(ci,cj)logp(ci,cj)=ciC,cjC|cicj||V|log(|cicj||V|)(42)(43)
View Sourcewe can express VI and NMI as a function of counts only as follows:
VI(C,C)NMI(C,C)=1|V|ciC,cjC|cicj|log(|cicj|2|ci||cj|)=2ciC,cjC|cicj||V|log(|cicj||V||ci||cj|)ciC|ci||V|log|ci||V|+cjC|cj||V|log|cj||V|(44)(45)
View Source
where |ci| is the number of nodes in community ci of C and |cicj| is the number of nodes both in community ci of C and in community cj of C.

2. Clustering Matching-Based Metrics

Measurements based on clustering matching aim at finding the largest overlaps between pairs of communities of two partitions C and C. F-measure [47] measures the similarity between two partitions, while Normalized Van Dongen metric (NVD) [48] quantifies the “distance” between partitions C and C. F-measure is defined as

F-measure(C,C)=1|V|ciC|ci|maxcjC2|cicj||ci|+|cj|.(46)
View SourceNVD is given by
NVD(C,C)=112|V|(ciCmaxcjC|cicj|+cjCmaxciC|cjci|.(47)
View Source

3. Pair Counting-Based Metrics

Metrics based on pair counting count the number of pairs of nodes that are classified (in the same community or in different communities) in two partitions C and C. Let a11 indicate the number of pairs of nodes that are in the same community in both partitions, a10 denote the number of pairs of nodes that are in the same community in partition C but in different communities in C, a01 be the number of pairs of nodes which are in different communities in C but in the same community in C, a00 be the number of pairs of nodes which are in different communities in both partitions. By definition, A=a11+a10+a01+a00=|V|(|V|1)2 is the total number of pairs of nodes in the network. Then, RI [49] which is the ratio of the number of node pairs placed in the same way in both partitions to the total number of pairs is given by

RI(C,C)=a11+a00A.(48)
View Source

Denote M=1A(a11+a10)(a11+a01). Then, RI's corresponding adjusted version, ARI [50], is expressed as

ARI(C,C)=a11M12[(a11+a10)+(a11+a01)]M.(49)
View Source

The JI [51] which is the ratio of the number of node pairs placed in the same community in both partitions to the number of node pairs that are placed in the same group in at least one partition is defined as

JI(C,C)=a11a11+a10+a01.(50)
View Source

Each of these three metrics quantifies the similarity between two partitions C and C.

B. Real Networks

In this section, we first evaluate the performance of Greedy Q, Fine-tuned Q, and Fine-tuned Qds on two small networks (Zachary's karate club network [52] and American college football network [53]) with ground truth communities, and then on two large networks [pretty-good-privacy (PGP) network [54] and autonomous system (AS) level Internet] but without ground truth communities.

1. Zachary's Karate Club Network

We first compare the performance of Greedy Q, Fine-tuned Q, and Fine-tuned Qds on Zachary's karate club network [52]. It represents the friendships between 34 members of a karate club at the U.S. University over a period of 2 years. During the observation period, the club split into two clubs as a result of a conflict within the organization. The resulting two new clubs can be treated as the ground truth communities whose structure is shown in Fig. 2(a) visualized with the opensource software Gephi [55].

Fig. 2. The community structures of the ground truth communities and those detected by Greedy Q, Fine-tuned Q, and Fine-tuned Qds on Zachary's karate club network: (a) ground truth communities, (b) communities detected with Greedy Q, (c) communities detected with Fine-tuned Q, and (d) communities detected with Fine-tuned Qds.

Table I presents the metric values of the community structures detected by the three algorithms on this network. It shows that Fine-tuned Q and Fine-tuned Qds achieve the highest value of Q and Qds, respectively. However, most of the seven metrics based on ground truth communities imply that Greedy Q performs the best with only NMI and NVD indicating that Fine-tuned Qds has the best performance among the three algorithms. Hence, it seems that a large Q or Qds may not necessarily mean a high quality of community structure, especially for Q because Fine-tuned Q achieves the highest Q but has the worst values of the seven metrics described in Section IV-A. We argue that the ground truth communities may not be so reasonable because Fine-tuned Q and Fine-tuned Qds in fact discover more meaningful communities than Greedy Q does. Fig. 2(a)–(d) shows the community structure of ground truth communities and those detected by Greedy Q, Fine-tuned Q, and Fine-tuned Qds, respectively. For results of Greedy Q shown in Fig. 2(b), we could observe that there are three communities located at the left, the center, and the right side of the network. The ground truth community located on the right is subdivided into the central and right communities, but the node 10 is misclassified as belonging to the central community, while in ground truth network it belongs to community located on the left. Fig. 2(c) demonstrates that Fine-tuned Q subdivides both the left and the right communities into two with six nodes separated from the left community and five nodes separated from the right community. Moreover, Fig. 2(c) shows that Fine-tuned Q discovers the same number of communities for this network as algorithms presented in [9], [16], [20], and [22]. In fact, the community structure it discovers is identical to those detected in [16], [20], and [22]. Fig. 2(d) shows that the community structure discovered by Fine-tuned Qds differs from that of Fine-tuned Q only on node 24 which is placed in the larger part of the left community. It is reasonable because node 24 has three connections to the larger part to which it has more attraction than to the smaller part with which it only has two connections.

TABLE I Metric Values of the Community Structures Discovered by Greedy Q, Fine-Tuned Q, and Fine-Tuned Qds on Zachary's Karate Club Network (Red Italic Font Denotes the Best Value for Each Metric)

In addition, analyzing the intermediate results of Fine-tuned Q and Fine-tuned Qds reveals that the communities at the first iteration are exactly the ground truth communities, which in another way implies their superiority over Greedy Q. Moreover, NMI and NVD indicate that Fine-tuned Qds is the best among the three and all the metrics, except Q, show that Fine-tuned Qds performs better than Fine-tuned Q, supporting the claim that a higher Qds (but not Q) implies a better quality of community structure.

2. American College Football Network

We apply the three algorithms also to the American college football network [53] which represents the schedule of games between college football teams in a single season. The teams are divided into 12 “conferences” with intra-conference games being more frequent than inter-conference games. Those conferences could be treated as the ground truth communities whose structure is shown in Fig. 3(a).

Fig. 3. The community structures of the ground truth communities and those detected by Greedy Q, Fine-tuned Q, and Fine-tuned Qds on American college football network: (a) ground truth communities, (b) communities detected with Greedy Q, (c) communities detected with Fine-tuned Q, and (d) communities detected with Fine-tuned Qds.

Table II presents the metric values of the community structures detected by the three algorithms. It shows that Fine-tuned Qds achieves the best values for all the nine metrics. It implies that Fine-tuned Qds performs best on this football network, followed by Fine-tuned Q. Fig. 3(a)–(d) presents the community structure of ground truth communities and those discovered by Greedy Q, Fine-tuned Q, and Fine-tuned Qds. Each color in the figures represents a community. It can be seen that there are twelve ground truth communities in total, seven communities detected by Greedy Q, nine communities discovered by Fine-tuned Q, and exactly twelve communities found by Fine-tuned Qds.

TABLE II Metric Values of the Community Structures Detected by Greedy Q, Fine-Tuned Q, and Fine-Tuned Qds on American College Football Network (Red Italic Font Denotes the Best Value for Each Metric)

Moreover, we apply Fine-tuned Qds on the community detection results of Greedy Q and Fine-tuned Q. The metric values of these two community structures after improvement with Fine-tuned Qds are shown in Table III. Compared with those of Greedy Q and Fine-tuned Q in Table II, we could observe that the metric values are significantly improved with Fine-tuned Qds. Further, both improved community structures contain exactly 12 communities, the same number as the ground truth communities.

TABLE III Metric Values of the Community Structures of Greedy Q and Fine-Tuned Q Improved With Fine-Tuned Qds on American College Football Network (Blue Italic Font Indicates Improved Score)

3. PGP Network

We then apply the three algorithms on PGP network [54]. It is the giant component of the network of users of the PGP algorithm for secure information interchange. It has 10 680 nodes and 24 316 edges.

Table IV presents the metric values of the community structures detected by the three algorithms. Since this network does not have ground truth communities, we only calculate Q and Qds of these discovered community structures. The table shows that Greedy Q and Fine-tuned Qds achieve the highest value of Q and Qds, respectively. It is worth to mention that the Qds of Fine-tuned Qds is much larger than that of Greedy Q and Fine-tuned Q, which implies that Fine-tuned Qds performs best on PGP network according to Qds, followed by Greedy Q.

TABLE IV Values of Q and Qds of the Community Structures Detected by Greedy Q, Fine-Tuned Q, and Fine-Tuned Qds on PGP Network (Red Italic Font Denotes the Best Value for Each Metric)

4. AS Level Internet

The last real network dataset that is adopted to evaluate the three algorithms is AS level Internet. It is a symmetrized snapshot of the structure of the Internet at the level of autonomous systems, reconstructed from Border Gateway Protocol (BGP) tables posted by the University of Oregon Route Views Project. This snapshot was created by Mark Newman from data for July 22, 2006 and has not been previously published. It has 22 963 nodes and 48 436 edges.

Table V presents the metric values of the community structures detected by the three algorithms. Since this network does not have ground truth communities either, we only calculate Q and Qds. It can be seen from the table that Fine-tuned Q and Fine-tuned Qds achieve the highest value of Q and Qds, respectively. Moreover, the Qds of Fine-tuned Qds is much larger than that of Greedy Q and Fine-tuned Q, which indicates that Fine-tuned Qds performs best on AS level Internet according to Qds, followed by Fine-tuned Q.

TABLE V Values of Q and Qds of the Community Structures Detected by Greedy Q, Fine-Tuned Q, and Fine-Tuned Qds on AS Level Internet (Red Italic Font Denotes the Best Value for Each Metric)

C. Synthetic Networks

1. Clique Network

We now apply the three algorithms to the classical network example [23][24][25], displayed in Fig. 4, which illustrates modularity (Q) has the resolution limit problem. It is a ring network comprising 30 identical cliques, each of which has five nodes and they are connected by single edges. It is intuitively obvious that each clique forms a single community.

Fig. 4. A ring network made out of 30 identical cliques, each having five nodes and connected by single edges.

Table VI presents the metric values of the community structures detected by the three algorithms. It shows that Greedy Q and Fine-tuned Q have the same performance. They both achieve the highest value of Q but get about half of the value of Qds of what Fine-tuned Qds achieves. In fact, Fine-tuned Qds finds exactly 30 communities with each clique being a single community. In contrast, Greedy Q and Fine-tuned Q discover only 16 communities with 14 communities having two cliques and the other two communities having a single clique. Also, we take the community detection results of Greedy Q and Fine-tuned Q as the input to Fine-tuned Qds to try to improve those results. The metric values of the community structures after improvement with Fine-tuned Qds are recorded in Table VII. This table shows that the community structures discovered are identical to that of Fine-tuned Qds, which means that the results of Greedy Q and Fine-tuned Q are dramatically improved with Fine-tuned Qds. Therefore, it can be concluded from Tables VI and VII that a larger value of Qds (but not Q) implies a higher quality of the community structure. Moreover, Qds solves the resolution limit problem of Q. Finally, Fine-tuned Qds is effective in maximizing Qds and in finding meaningful community structure.

TABLE VI Metric Values of the Community Structures Detected by Greedy Q, Fine-Tuned Q, and Fine-Tuned Qds on the Classical Clique Network (Red Italic Font Denotes the Best Value for Each Metric)
TABLE VII Metric Values of the Community Structures of Greedy Q and Fine-Tuned Q Improved With Fine-Tuned Qds on the Classical Clique Network (Blue Italic Font Indicates Improved Score)

2. LFR Benchmark Networks

To further compare the performance of Greedy Q, Fine-tuned Q, and Fine-tuned Qds, we choose the LFR benchmark networks [56] which have become a standard in the evaluation of the performance of community detection algorithms and also have known ground truth communities. The LFR benchmark network that we used here has 1000 nodes with average degree 15 and maximum degree 50. The exponent γ for the degree sequence varies from 2 to 3. The exponent β for the community size distribution ranges from 1 to 2. Then, four pairs of the exponents (γ,β)=(2,1),(2,2),(3,1), and (3,2) are chosen in order to explore the widest spectrum of graph structures. The mixing parameter μ is varied from 0.05 to 0.5. It means that each node shares a fraction (1μ) of its edges with the other nodes in its community and shares a fraction μ of its edges with the nodes outside its community. Thus, low mixing parameters indicate strong community structure. Also, we generate 10 network instances for each μ. Hence, each metric value in Tables VIII XII represents the average metric values of all 10 instances. Since the experimental results are similar for all four pairs of exponents (γ,β)=(2,1),(2,2),(3,1), and (3,2), for the sake of brevity, we only present the results for (γ,β)=(2,1) here.

TABLE VIII Metric Values of the Community Structures of Greedy Q on the LFR Benchmark Networks With (γ,β)=(2,1)
TABLE IX Metric Values of the Community Structures of Fine-Tuned Q on the LFR Benchmark Networks With (γ,β)=(2,1)
TABLE X Metric Values of the Community Structures of Fine-Tuned Qds on the LFR Benchmark Networks With (γ,β)=(2,1)
TABLE XI Metric Values of the Community Structures of Greedy Q Improved With Fine-Tuned Qds on the LFR Benchmark Networks With (γ,β)=(2,1)
TABLE XII Metric Values of the Community Structures of Fine-Tuned Q Improved With Fine-Tuned Qds on the LFR Benchmark Networks With (γ,β)=(2,1)

Tables VIII X show the metric values of the community structures detected with Greedy Q, Fine-tuned Q, and Fine-tuned Qds, respectively, on the LFR benchmark networks with (γ,β)=(2,1) and μ varying from 0.05 to 0.5. The red italic font in the table denotes that the corresponding algorithm achieves the best value for a certain quality metric among the three algorithms. The results in these tables show that Greedy Q obtains the best values for all the nine measurements when μ=0.05, while Fine-tuned Qds achieves the highest values of Qds and the best values for almost all the seven metrics based on ground truth communities when μ ranges from 0.1 to 0.5. Also, Fine-tuned Q gets the second best values for Qds and almost all the seven metrics in the same range of μ. However, for Q the best is Greedy Q, followed by Fine-tuned Q, and Fine-tuned Qds is the last.

In summary, the seven measurements based on ground truth communities are all consistent with Qds, but not consistent with Q. This consistency indicates the superiority of Qds over Q as a community quality metric. In addition, Fine-tuned Qds performs best among the three algorithms for μ>0.05, which demonstrates that it is very effective and does a very good job in optimizing Qds.

We then take the community detection results of Greedy Q and Fine-tuned Q as the input to Fine-tuned Qds to improve those results. The measurement values of the community structures after improvement with Fine-tuned Qds are displayed in Tables XI and XII. The blue italic font in Tables XI and XII implies that the metric value in these two tables is improved compared to the one in Table VIII and that in Table IX, respectively. Then, compared with those of Greedy Q shown in Table VIII and those of Fine-tuned Q shown in Table IX, all measurements, except in some cases for Q, are significantly improved with Fine-tuned Qds. This again indicates that all the seven metrics described in Section IV-A are consistent with Qds, but not consistent with Q. Interestingly, those results are even better than those of Fine-tuned Qds itself presented in Table X. Thus, it can be concluded that Fine-tuned Qds is very powerful in improving the community detection results of other algorithms.

SECTION V.

Conclusion

In this paper, we review the definition of modularity and its corresponding maximization methods. Moreover, we show that modularity optimization has two opposite but coexisting issues. We also review several community quality metrics proposed to solve the resolution limit problem. We then discuss our Modularity Density (Qds) metric which simultaneously avoids those two problems. Finally, we propose an efficient and effective fine-tuned algorithm to maximize Qds. This new algorithm can actually be used to optimize any community quality metric. We evaluate the three algorithms, Greedy Q, Fine-tuned Q based on Q, and Fine-tuned Qds based on Qds, with seven metrics based on ground truth communities. These evaluations are done on four real networks, and also on the classical clique network and the LFR benchmark networks, each instance of the last is defined with parameters selected from wide range of their values. The results demonstrate that Fine-tuned Qds performs best among the three algorithms, followed by Fine-tuned Q. The experiments also show that Fine-tuned Qds can dramatically improve the community detection results of other algorithms. In addition, all the seven quality metrics based on ground truth communities are consistent with Qds, but not consistent with Q, which indicates the superiority of Qds over Q as a community quality metric.

ACKNOWLEDGEMENT

The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies either expressed or implied of the Army Research Laboratory or the U.S. Government.

    References

    1.M. E. J. Newman and M. Girvan, "Finding and evaluating community structure in networks", Phys. Rev. E, vol. 69, pp. 026113, Feb. 2004.
    2.S. Fortunato, "Community detection in graphs", Phys. Rep., vol. 486, pp. 75-174, 2010.
    3.M. E. J. Newman, "Modularity and community structure in networks", Proc. Nat. Acad. Sci., vol. 103, no. 23, pp. 8577-8582, 2006.
    4.E. A. Leicht and M. E. J. Newman, "Community structure in directed networks", Phys. Rev. Lett., vol. 100, pp. 118703, Mar. 2008.
    5.M. E. J. Newman, "Analysis of weighted networks", Phys. Rev. E, vol. 70, pp. 056131, Nov. 2004.
    6.M. E. J. Newman, "Fast algorithm for detecting community structure in networks", Phys. Rev. E, vol. 69, pp. 066133, Jun. 2004.
    7.A. Clauset, M. E. J. Newman and C. Moore, "Finding community structure in very large networks", Phys. Rev. E, vol. 70, pp. 066111, Dec. 2004.
    8.K. Wakita and T. Tsurumi, "Finding community structure in mega-scale social networks: [extended abstract]", Proc. 16th Int. Conf. World Wide Web, pp. 1275-1276, 2007.
    9.V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks", J. Statist. Mech.: Theory Exp., vol. 2008, no. 10, pp. P10008, 2008.
    10.M. E. J. Newman, "Finding community structure in networks using the eigenvectors of matrices", Phys. Rev. E, vol. 74, pp. 036104, Sep. 2006.
    11.T. Richardson, P. J. Mucha and M. A. Porter, "Spectral tripartitioning of networks", Phys. Rev. E, vol. 80, pp. 036111, Sep. 2009.
    12.S. White and P. Smyth, "A spectral clustering approach to finding communities in graph", Proc. SIAM Int. Conf. Data Mining, pp. 274-285, 2005-Apr.
    13.J. Ruan and W. Zhang, "An efficient spectral algorithm for network community discovery and its applications to biological and social networks", Proc. 7th IEEE Int. Conf. Data Mining, pp. 643-648, 2007.
    14.J. Ruan and W. Zhang, "Identifying network communities with a high resolution", Phys. Rev. E, vol. 77, pp. 016104, Jan. 2008.
    15.M. E. J. Newman, "Spectral methods for network community detection and graph partitioning", Phys. Rev. E, vol. 88, pp. 042822, 2013.
    16.J. Duch and A. Arenas, "Community detection in complex networks using extremal optimization", Phys. Rev. E, vol. 72, pp. 027104, Aug. 2005.
    17.R. Guimerà and L. A. N. Amaral, "Functional cartography of complex metabolic networks", Nature, vol. 433, pp. 895-900, Feb. 2005.
    18.R. Guimerà and L. A. N. Amaral, "Cartography of complex networks: Modules and universal roles", J. Statist. Mech.: Theory Exp., vol. 2005, no. 02, pp. P02001, 2005.
    19.C. P. Massen and J. P. K. Doye, "Identifying communities within energy landscapes", Phys. Rev. E, vol. 71, pp. 046101, Apr. 2005.
    20.A. Medus, G. Acuna and C. Dorso, "Detection of community structures in networks via global optimization", Phys. A: Statist. Mech. Appl., vol. 358, no. 24, pp. 593-604, 2005.
    21.M. Sales-Pardo, R. Guimerà, A. A. Moreira and L. A. N. Amaral, "Extracting the hierarchical organization of complex systems", Proc. Nat. Acad. Sci. U. S. A., vol. 104, no. 39, pp. 15224-15229, 2007-Sep.
    22.G. Agarwal and D. Kempe, "Modularity-maximizing graph communities via mathematical programming", Eur. Phys. J. B, vol. 66, no. 3, pp. 409-418, 2008.
    23.M. Chen, T. Nguyen and B. K. Szymanski, "On measuring the quality of a network community structure", Proc. ASE/IEEE Int. Conf. Social Comput., pp. 122-127, 2013-Sep.
    24.M. Chen, T. Nguyen and B. K. Szymanski, "A new metric for quality of network community structure", ASE Hum. J., vol. 2, no. 4, pp. 226-240, 2013.
    25.S. Fortunato and M. Barthèlemy, "Resolution limit in community detection", Proc. Nat. Acad. Sci., vol. 104, no. 1, pp. 36-41, 2007.
    26.U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer and Z. Nikoloski, "On modularity clustering", IEEE Trans. Knowl. Data Eng., vol. 20, no. 2, pp. 172-188, Feb. 2008.
    27.T. Chakraborty, S. Srinivasan, N. Ganguly, S. Bhowmick and A. Mukherjee, "Constant communities in complex networks", Nature Scientific Reps. 3, May 2013.
    28.J. Demmel, J. Dongarra, A. Ruhe and H. van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, USA, PA, Philadelphia:SIAM, 2000.
    29.S. Boettcher and A. G. Percus, "Optimization with extremal dynamics", Phys. Rev. Lett., vol. 86, pp. 5211-5214, Jun. 2001.
    30.S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, "Optimization by simulated annealing", Science, vol. 220, no. 4598, pp. 671-680, May 1983.
    31.H. Karloff, Linear Programming, USA, MA, Boston:Birkhäuser, 1991.
    32.J. W. Berry, B. Hendrickson, R. A. LaViolette and C. A. Phillips, "Tolerating the community detection resolution limit with edge weighting", Phys. Rev. E, vol. 83, pp. 056119, May 2011.
    33.S. Muff, F. Rao and A. Caflisch, "Local modularity measure for network clusterizations", Phys. Rev. E, vol. 72, no. 5, pp. 056107, 2005.
    34.A. Arenas, A. Fernandez and S. Gomez, "Analysis of the structure of complex networks at different resolution levels", New J. Phys., vol. 10, no. 5, pp. 053039, 2008.
    35.A. Lancichinetti and S. Fortunato, "Limits of modularity maximization in community detection", Phys. Rev. E, vol. 84, no. 6, pp. 066122, 2011.
    36.C. Granell, S. Gomez and A. Arenas, "Hierarchical multiresolution method to overcome the resolution limit in complex networks", Int. J. Bifur. Chaos, vol. 22, pp. 1250171, 2012.
    37.Z. Li, S. Zhang, R.-S. Wang, X.-S. Zhang and L. Chen, "Quantitative function for community detection", Phys. Rev. E, vol. 77, no. 3, pp. 036109, 2008.
    38.V. A. Traag, P. Van Dooren and Y. Nesterov, "Narrow scope for resolution-limit-free community detection", Phys. Rev. E, vol. 84, no. 1, pp. 016114, 2011.
    39.J. Reichardt and S. Bornholdt, "Partitioning and modularity of graphs with arbitrary degree distribution", Phys. Rev. E, vol. 76, pp. 015102, Jul. 2007.
    40.A. S. Waugh, L. Pei, J. H. Fowler, P. J. Mucha and M. A. Porter, "Party polarization in congress: A network science approach", 2011.
    41.N. Eagle, A. Pentland and D. Lazer, "Inferring social network structure using mobile phone data", Proc. Nat. Acad. Sci. (PNAS), vol. 106, no. 36, pp. 15274-15278, 2009.
    42.Y.-C. Wei and C.-K. Cheng, "Towards efficient hierarchical designs by ratio cut partitioning", Proc. IEEE Int. Conf. Comput. Aided Des., pp. 298-301, 1989.
    43.M. Fiedler, "Algebraic connectivity of graphs", Czech. Math. J., vol. 23, pp. 298-305, 1973.
    44.A. Pothen, H. D. Simon and K.-P. Liou, "Partitioning sparse matrices with eigenvectors of graphs", SIAM J. Matrix Anal. Appl., vol. 11, no. 3, pp. 430-452, May 1990.
    45.B. H. Good, Y.-A. de Montjoye and A. Clauset, "Performance of modularity maximization in practical contexts", Phys. Rev. E, vol. 81, pp. 046106, Apr. 2010.
    46.L. Danon, A. Díaz-Guilera, J. Duch and A. Arenas, "Comparing community structure identification", J. Statist. Mech.: Theory Exp., vol. 2005, no. 09, pp. P09008, 2005.
    47.S. Wagner and D. Wagner, Comparing clusterings—An overview, 2007.
    48.S. Van Dongen, Performance criteria for graph clustering and Markov cluster experiments, 2000.
    49.W. M. Rand, "Objective criteria for the evaluation of clustering methods", J. Amer. Statist. Assoc., vol. 66, no. 336, pp. 846-850, 1971.
    50.L. Hubert and P. Arabie, "Comparing partitions", J. Classification, vol. 2, no. 1, pp. 193-218, 1985.
    51.A. Ben-Hur, A. Elisseeff and I. Guyon, "A stability based method for discovering structure in clustered data", Proc. Pacific Symp. BIOCOMPUTING, pp. 6-17, 2001.
    52.W. Zachary, "An information flow model for conflict and fission in small groups", J. Anthropol. Res., vol. 33, pp. 452-473, 1977.
    53.M. Girvan and M. E. J. Newman, "Community structure in social and biological networks", Proc. Nat. Acad. Sci., vol. 99, no. 12, pp. 7821-7826, 2002.
    54.M. Boguñá, R. Pastor-Satorras, A. Díaz-Guilera and A. Arenas, "Models of social networks based on social distance attachment", Phys. Rev. E, vol. 70, pp. 056122, Nov. 2004.
    55.M. Bastian, S. Heymann and M. Jacomy, "Gephi: An open source software for exploring and manipulating networks", Proc. Int. AAAI Conf. Weblogs and Social Media (ICWSM), pp. 361-362, 2009-May.
    56.A. Lancichinetti, S. Fortunato and F. Radicchi, "Benchmark graphs for testing community detection algorithms", Phys. Rev. E, vol. 78, pp. 046110, Oct. 2008.