In this section, we first review the definition of modularity and the corresponding optimization approaches. Then, we discuss the two opposite yet coexisting problems of modularity maximization. Finally, we overview several community quality measurements proposed to solve the resolution limit problem and then discuss Modularity Density (Qds) [23], [24] which simultaneously avoids these two problems.
A. Definition of Modularity
Comparing results of different network partitioning algorithms can be challenging, especially when network structure is not known beforehand. A concept of modularity defined in [1] provides a measure of the quality of a particular partitioning of a network. Modularity (Q) quantifies the community strength by comparing the fraction of edges within the community with such fraction when random connections between the nodes are made. The justification is that a community should have more links between themselves than a random gathering of people. Thus, the Q value close to 0 means that the fraction of edges inside communities is no better than the random case, and the value of 1 means that a network community structure has the highest possible strength.
Formally, modularity (Q) can be defined as [1]
Q=∑ci∈C[|Einci||E|−(2|Einci|+|Eoutci|2|E|)2](1)
View Source
Q = \sum\limits_{{c_i} \in C} \left[{{{\vert {E_{{c_i}}^{{\rm in}}} \vert} \over {\vert E \vert}} - {{\left({{{2\vert {E_{{c_i}}^{{\rm in}}} \vert + \vert {E_{{c_i}}^{{\rm out}}} \vert} \over {2\vert E \vert}}} \right)}^2}} \right]\eqno{\hbox{(1)}}where
C is the set of all the communities,
ci is a specific community in
C,
|Einci| is the number of edges between nodes within community
ci,
|Eoutci| is the number of edges from the nodes in community
ci to the nodes outside
ci, and
|E| is the total number of edges in the network.
Modularity can also be expressed in the following form [3]:
Q=12|E|∑ij[Aij−kikj2|E|]δci,cj(2)
View Source
Q = {1 \over {2\vert E\vert}}\sum\limits_{ij} \left[{{A_{ij}} - {{{k_i}{k_j}} \over {2\vert E\vert}}} \right]{\delta _{{c_i},{c_j}}}\eqno{\hbox{(2)}}where
ki is the degree of node
i,
Aij is an element of the adjacency matrix,
δci,cj is the Kronecker delta symbol, and
ci is the label of the community to which node
i is assigned.
Since larger Q means a stronger community structure, several algorithms, which we will discuss in Section II-B, are based on modularity optimization.
The modularity measure defined above is suitable only for undirected and unweighted networks. However, this definition can be naturally extended to apply to directed networks as well as to weighted networks. Weighted and directed networks contain more information than undirected and unweighted ones. While this additional information makes such networks more valuable, it also makes them more difficult to analyze.
The revised definition of modularity that works for directed networks is as follows [4]:
Q=1|E|∑ij[Aij−kinikoutj|E|]δci,cj(3)
View Source
Q = {1 \over {\vert E\vert}}\sum\limits_{ij} \left[{{A_{ij}} - {{k_i^{{\rm in}}k_j^{{\rm out}}} \over {\vert E\vert}}} \right]{\delta _{{c_i},{c_j}}}\eqno{\hbox{(3)}}where
kini and
koutj are the in- and out-degrees.
Although many networks can be regarded as binary, i.e., as either having an edge between a pair of nodes or not having it, there are many other networks for which it would be natural to treat edges as having a certain degree of strength or weight.
The same general techniques that have been developed for unweighted networks are applied to its weighted counterparts in [5] by mapping weighted networks onto multigraphs. For non-negative integer weights, an edge with weight w in a weighted graph corresponds to w parallel edges in a corresponding multigraph. Although negative weights can arise in some applications, they are rarely useful in social networks, so for the sake of brevity we will not discuss them here. It turns out that an adjacency matrix of a weighted graph is equivalent to that of a multigraph with unweighted edges. Since the structure of adjacency matrix is independent of the edge weights, it is possible to adjust all the methods developed for unweighted networks to the weighted ones.
It is necessary to point out that the notion of degree of a node should also be extended for the weighted graphs. In this case, degree of a node is defined as the sum of weights of all edges incident to this node.
It is shown in [5] that the same definitions of modularity that were given above hold for the weighted networks as well, if we treat Aij as the value that represents weight of the connection and set |E|=12∑ijAij.
B. Modularity Optimization Approaches
In the literature, a high value of modularity (Q) indicates a good community structure and the partition corresponding to the maximum value of modularity on a given graph is supposed to have the highest quality, or at least a very good one. Therefore, it is natural to discover communities by maximizing modularity over all possible partitions of a network. However, it is computationally prohibitively expensive to exhaustively search all such partitions for the optimal value of modularity since modularity optimization is known to be NP-hard (i.e., it belongs to non-deterministic polynomial-time hard class of problems) [26]. However, many heuristic methods were introduced to find high-modularity partitions in a reasonable time. Those approaches include greedy algorithms [6]–[7][8][9], spectral methods [3], [10]–[11][12][13][14][15], EO [16], SA [17]–[18][19][20], sampling technique [21], and mathematical programming [22]. In this section, we will review those modularity optimization heuristics.
1. Greedy Algorithms
The first greedy algorithm was proposed by Newman [6]. It is a agglomerative hierarchical clustering method. Initially, every node belongs to its own community, creating altogether |V| communities. Then, at each step, the algorithm repeatedly merges pairs of communities together and chooses the merger for which the resulting modularity is the largest. The change in Q upon joining two communities ci and cj is
ΔQci,cj=2(|Eci,cj|2|E|−|Eci||Ecj|4|E|2),(4)
View Source
\Delta {Q_{{c_i},{c_j}}} = 2\left({{{\vert{E_{{c_i},{c_j}}}\vert} \over {2\vert E\vert}} - {{\vert{E_{{c_i}}}\vert\vert{E_{{c_j}}}\vert} \over {4\vert E{\vert^2}}}} \right),\eqno{\hbox{(4)}}where
|Eci,cj| is the number of edges from community
ci to community
cj and
|Eci|=2|Einci|+|Eoutci| is the total degrees of nodes in community
ci.
ΔQci,cj can be calculated in constant time. The algorithm stops when all the nodes in the network are in a single community after
(|V|−1) steps of merging. Then, there are totally
|V| partitions, the first one defined by the initial step and each subsequent one resulting from each of the subsequent
(|V|−1) merging steps. The partition with the largest value of modularity, approximating the modularity maximum best, is the result of the algorithm. At each merging step, the algorithm needs to compute the change
ΔQci,cj of modularity resulting from joining any two currently existing communities
ci and
cj in order to choose the best merger. Since merging two disconnected communities will not increase the value of modularity, the algorithm checks only the merging of connected pairs of communities and the number of such pairs is at most
|E| limiting the complexity of this part to
O(|E|). However, the rows and columns of adjacent matrix corresponding to the two merged communities must be updated, which takes
O(|V|). Since there are
(|V|−1) iterations, the final complexity of the algorithm is
O((|E|+|V|)|V|), or
O(|V|2) for sparse networks.
Although Newman's algorithm [6] is much faster than the algorithm of Newman and Girvan [1] whose complexity is O(|E|2|V|), Clauset et al. [7] pointed out that the update of the adjacent matrix at each step contains a large number of unnecessary operations when the network is sparse and therefore its matrix has a lot of zero entries. They introduced data structures for sparse matrices to perform the updating operation more efficiently. In their algorithm, instead of maintaining the adjacent matrix and computing ΔQci,cj, they maintained and updated the matrix with entries being ΔQci,cj for the pairs of connected communities ci and cj. The authors introduced three data structures to represent sparse matrices efficiently: 1) each row of the matrix is stored as a balanced binary tree in order to search and insert elements in O(log|V|) time and also as a max-heap so as to locate the largest element of each row in constant time; 2) another max-heap stores the largest element of each row of the matrix so as to locate the largest ΔQci,cj in constant time; 3) a vector is used to save |Eci| for each community ci. Then, in each step, the largest ΔQci,cj can be found in constant time and the update of the adjacent matrix after merging two communities ci and cj takes O((kci+kcj)log|V|), where kci and kcj are the numbers of neighboring communities of communities ci and cj, respectively. Thus, the total running time is at most O(log|V|) times the sum of the degrees of nodes in the communities along the dendrogram created by merging steps. This sum is in the worst case the depth of the dendrogram times the sum of the degrees of nodes in the network. Suppose the dendrogram has depth d, then the running time is O(d|E|log|V|) or O(|V|log2|V|), when the network is sparse and the dendrogram is almost balanced (d∼log|V|).
However, Wakita and Tsurumi [8] observed that the greedy algorithm proposed by Clauset et al. is not scalable to networks with sizes larger than 500 000 nodes. They found that the computational inefficiency arises from merging communities in an unbalanced manner, which yields very unbalanced dendrograms. In such cases, the relation d∼log|V| does not hold any more, causing the algorithm to run at its worst-case complexity. To balance the merging of communities, the authors introduced three types of consolidation ratios to measure the balance of the community pairs and used it with modularity to perform the joining process of communities without bias. This modification enables the algorithm to scale to networks with sizes up to 10 000 000. It also approximates the modularity maximum better than the original algorithm.
Another type of greedy modularity optimization algorithm different from those above was proposed by Blondel et al., and it is usually referred to as Louvain [9]. It is divided into two phases that are repeated iteratively. Initially, every node belongs to the community of itself, so there are |V| communities. In this first phase, every node, in a certain order, is considered for merging into its neighboring communities and the merger with the largest positive gain is selected. If all possible gains associated with the merging of this node are negative, then it stays in its original community. This merging procedure repeats iteratively and stops when no increase of Q can be achieved.
After the first phase, Louvain reaches a local maximum of Q. Then, the second phase of Louvain builds a community network based on the communities discovered in the first phase. The nodes in the new network are the communities from the first phase and there is a edge between two new nodes if there are edges between nodes in the corresponding two communities. The weights of those edges are the sum of the weights of the edges between nodes in the corresponding two communities. The edges between nodes of the same community of the first phase result in a self-loop for this community node in the new network. After the community network is generated, the algorithm applies the first phase again on this new network. The two phases repeat iteratively and stop when there is no more change and consequently a maximum modularity is obtained. The number of iterations of this algorithm is usually very small and most of computational time is spent in the first iteration. Thus, the complexity of the algorithm grows like O(|E|). Consequently, it is scalable to large networks with the number of nodes up to a billion. However, the results of Louvain are impacted by the order in which the nodes in the first phase are considered for merging [27].
2. Spectral Methods
There are two categories of spectral algorithms for maximizing modularity: 1) based on the modularity matrix [3], [10], [11]; and 2) based on the Laplacian matrix of a network [12]–[13][14].
a) Modularity optimization using the eigenvalues and eigenvectors of the modularity matrix [3], [10], [11]
Modularity (Q) can be expressed as [3]
Q=14|E|∑ij(Aij−kikj2|E|)(sisj+1)=14|E|∑ij(Aij−kikj2|E|)sisj=14|E|sTBs(5)
View Source
\eqalignno{Q &= {1 \over {4\vert E\vert}}\sum\limits_{ij} \left({{A_{ij}} - {{{k_i}{k_j}} \over {2\vert E\vert}}} \right)({s_i}{s_j} + 1) \cr &= {1 \over {4\vert E\vert}}\sum\limits_{ij} \left({{A_{ij}} - {{{k_i}{k_j}} \over {2\vert E\vert}}} \right){s_i}{s_j} = {1 \over {4\vert E\vert}}{{\mbi {s}}^T}{\mbi {Bs}} &\hbox{(5)}}where
Aij are the elements of adjacent matrix
A and
s is the column vector representing any division of the network into two groups. Its elements are defined as
si=+1 if node
i belongs to the first group and
si=−1 if it belongs to the second group.
B is the modularity matrix with elements
Bij=Aij−kikj2|E|.(6)
View Source
{B_{ij}} = {A_{ij}} - {{{k_i}{k_j}} \over {2\vert E\vert}}.\eqno{\hbox{(6)}}Representing s as a linear combination of the normalized eigenvectors ui of B: s=∑|V|i=1aiui with ai=uiT⋅s, and then plugging the result into (5) yields
Q=14|E|∑iaiuiTB∑jajuj=14|E|∑ia2iβi(7)
View Source
Q = {1 \over {4\vert E\vert}}\sum\limits_i {a_i}{{\mbi {u}}_{\mbi {i}}}^T{\mbi {B}}\sum\limits_j {a_j}{{\mbi {u}}_{\mbi {j}}} = {1 \over {4\vert E\vert}}\sum\limits_i a_i^2{\beta _i}\eqno{\hbox{(7)}}where
βi is the eigenvalue of
B corresponding to eigenvector
ui. To maximize
Q above, Newman
[3] proposed a spectral approach to choose
s proportional to the leading eigenvector
u1 corresponding to the largest (most positive) eigenvalue
β1. The choice assumes that the eigenvalues are labeled in decreasing order
β1≥β2≥…≥β|V|. Nodes are then divided into two communities according to the signs of the elements in
s with nodes corresponding to positive elements in
s assigned to one group and all remaining nodes to another. Since the row and column sums of
B are zero, it always has an eigenvector
(1,1,1,…) with eigenvalue zero. Therefore, if it has no positive eigenvalue, then the leading eigenvector is
(1,1,1,…), which means that the network is indivisible. Moreover, Newman
[3] proposed to divide network into more than two communities by repeatedly dividing each of the communities obtained so far into two until the additional contribution
ΔQ to the modularity made by the subdivision of a community
c ΔQ=12|E|[12∑i,j∈cBij(sisj+1)−∑i,j∈cBij]=14|E|sTB(c)s(8)
View Source
\eqalignno{\Delta Q &= {1 \over {2\vert E\vert}}\left[{{1 \over 2}\sum\limits_{i,j \in c} {B_{ij}}({s_i}{s_j} + 1) - \sum\limits_{i,j \in c} {B_{ij}}} \right] \cr &= {1 \over {4\vert E\vert}}{{\mbi {s}}^T}{{\mbi {B}}^{(c)}}{\mbi {s}} &\hbox{(8)}}is equal to or less than 0.
B(c) in the formula above is the generalized modularity matrix. Its elements, indexed by the labels
i and
j of nodes within community
c, are
B(c)ij=Bij−δij∑k∈cBik.(9)
View Source
B_{ij}^{(c)} = {B_{ij}} - {\delta _{ij}}\sum\limits_{k \in c} {B_{ik}}.\eqno{\hbox{(9)}}Then, the same spectral method can be applied to B(c) to maximize ΔQ. The recursive subdivision process stops when ΔQ≤0, which means that there is no positive eigenvalue of the matrix B(c). The overall complexity of this algorithm is O((|E|+|V|)|V|).
However, the spectral algorithm described above has two drawbacks. First, it divides a network into more than two communities by repeated division instead of getting all the communities directly in a single step. Second, it only uses the leading eigenvector of the modularity matrix and ignores all the others, losing all the useful information contained in those eigenvectors. Newman later proposed to divide a network into a set of communities C with |C|≥2 directly using multiple leading eigenvectors [10]. Let S=(sc) be an |V|×|C| “community-assignment” matrix with one column for each community c defined as
Si,c={1,0,if node i belongs to community cotherwise(10)
View Source
{S_{i,c}} = \left\{\matrix{{1,} \hfill & {\hbox{if node}\ i \ \hbox{belongs to community}\ c} \hfill\cr {0,} \hfill & {\hbox{otherwise}} \hfill\cr } \right.\eqno{\hbox{(10)}}then the modularity (
Q) for this direct division of the network is given by
Q=12|E|∑i,j=1|V|∑c∈CBijSi,cSj,c=12|E|Tr(STBS)(11)
View Source
Q = {1 \over {2\vert E\vert}}\sum\limits_{i,j = 1}^{\vert V\vert} \sum\limits_{c \in C} {B_{ij}}{S_{i,c}}{S_{j,c}} = {1 \over {2\vert E\vert}}{\rm Tr}({{\mbi {S}}^T}{\mbi {BS}})\eqno{\hbox{(11)}}where
Tr(STBS) is the trace of matrix
STBS. Defining
B=UΣUT, where
U=(u1,u2,…) is the matrix of eigenvectors of
B and
Σ is the diagonal matrix of eigenvalues
Σii=βi, yields
Q=12|E|∑i=1|V|∑c∈Cβi(uiTsc)2.(12)
View Source
Q = {1 \over {2\vert E\vert}}\sum\limits_{i = 1}^{\vert V\vert} \sum\limits_{c \in C} {\beta _i}{({{\mbi {u}}_{\mbi {i}}}^T{{\mbi {s}}_{\mbi {c}}})^2}.\eqno{\hbox{(12)}}Then, obtaining |C| communities is equivalent to selecting |C|−1 independent, mutually orthogonal columns sc. Moreover, Q would be maximized by choosing the columns sc proportional to the leading eigenvectors of B. However, only the eigenvectors corresponding to the positive eigenvalues will contribute positively to the modularity. Thus, the number of positive eigenvalues, plus 1, is the upper bound of |C|. More general modularity maximization is to keep the leading p(1≤p≤|V|) eigenvectors. Q can be rewritten as
Q=12|E|(|V|α+Tr[STU(Σ−αI)UTS])=12|E|⎛⎝⎜|V|α+∑j=1|V|∑c∈C(βj−α)[∑i=1|V|UijSi,c]2⎞⎠⎟(13)
View Source
\eqalignno{Q & = {1 \over {2\vert E\vert}}\left({\vert V\vert\alpha + {\rm Tr}[{{\mbi {S}}^T}{\mbi {U}}({\Sigmab } - \alpha {\mbi {I}}){{\mbi {U}}^T}{\mbi {S}}]} \right) \cr &= {1 \over {2\vert E\vert}}\left({\vert V\vert\alpha + \sum\limits_{j = 1}^{\vert V\vert} \sum\limits_{c \in C} ({\beta _j} - \alpha){{\left[{\sum\limits_{i = 1}^{\vert V\vert} {U_{ij}}{S_{i,c}}} \right]}^2}} \right)\cr & &\hbox{(13)}}where
α(α≤βp) is a constant related to the approximation for
Q obtained by only adopting the first
p leading eigenvectors. By selecting
|V| node vectors
ri of dimension
p whose
jth component is
[ri]j=βj−α−−−−−√Uij(14)
View Source
{[{{\mbi {r}}_{\mbi {i}}}]_j} = \sqrt {{\beta _j} - \alpha } {U_{ij}}\eqno{\hbox{(14)}}modularity can be approximated as
Q≃Q~=12|E|(|V|α+∑c∈C|Rc|2)(15)
View Source
Q \simeq \mathtilde{Q} = {1 \over {2\vert E\vert}}\left({\vert V\vert\alpha + \sum\limits_{c \in C} \vert{{\mbi {R}}_{\mbi {c}}}{\vert^2}} \right)\eqno{\hbox{(15)}}where
Rc,
c∈C, are the community vectors
Rc=∑i∈cri.(16)
View Source
{{\mbi {R}}_{\mbi {c}}} = \sum\limits_{i \in c} {{\mbi {r}}_{\mbi {i}}}.\eqno{\hbox{(16)}}Thus, the community detection problem is equivalent to choosing such a division of nodes into |C| groups that maximizes the magnitudes of the community vectors Rc while requiring that Rc⋅ri>0 if node i is assigned to community c. Problems of this type are called vector partitioning problems.
Although [10] explored using multiple leading eigenvectors of the modularity matrix, it did not pursue it in detail beyond a two-eigenvector approach for bipartitioning [3], [10]. Richardson et al. [11] provided a extension of these recursive bipartitioning methods by considering the best two-way or three-way division at each recursive step to more thoroughly explore the promising partitions. To reduce the number of partitions considered for the eigenvector-pair tripartitioning, the authors adopted a divide-and-conquer method and as a result yielded an efficient approach whose computational complexity is competitive with the two-eigenvector bipartitioning method.
b) Modularity optimization using the eigenvalues and eigenvectors of the Laplacian matrix [12]–[13][14]
Given a partition C (a set of communities) and the corresponding “community-assignment” matrix S=(sc), White and Smyth [12] rewrote modularity (Q) as follows:
Q∝Tr(ST(W−D~)S)=−Tr(STLQS)(17)
View Source
Q \propto {\rm Tr}({{\mbi {S}}^T}({\mbi W} - \mathtilde{\mbi D}){\mbi {S}}) = - {\rm Tr}({{\mbi {S}}^T}{{\mbi {L}}_{\mbi {Q}}}{\mbi {S}})\eqno{\hbox{(17)}}where
W=2|E|A and the elements of
D~ are
D~ij=kikj. The matrix
LQ=D~−W is called the “Q-Laplacian.” Finding the “community-assignment” matrix
S that maximizes
Q above is NP-complete, but a good approximation can be obtained by relaxing the discreteness constraints of the elements of
S and allowing them to assume real values. Then,
Q becomes a continuous function of
S and its extremes can be found by equating its first derivative with respect to
S to zero. This leads to the eigenvalue equation
LQS=SΛ(18)
View Source
{{\mbi {L}}_{\bf Q}}{\mbi {S}} = {\mbi {S}}{\Lambdabmit} \eqno{\hbox{(18)}}where
Λ is the diagonal matrix of Lagrangian multipliers. Thus, the modularity optimization problem is transformed into the standard spectral graph partitioning problem. When the network is not too small,
LQ can be approximated well, up to constant factors, by the transition matrix
W~=D−1A obtained by normalizing
A so that all rows sum to one. Here,
D is the diagonal degree matrix of
A. It can be shown that the eigenvalues and eigenvectors of
W~ are precisely
1−λ and
μ, where
λ and
μ are the solutions to the generalized eigenvalue problem
Lμ=λDμ, where
L=D−A is the Laplacian matrix. Thus, the underlying spectral algorithm here is equivalent to the standard spectral graph partitioning problem which uses the eigenvalues and eigenvectors of the Laplacian matrix.
Based on the above analysis, White and Smyth proposed two clustering algorithms, named “Algorithm Spectral-1” and “Algorithm Spectral-2,” to search for a partition C with size up to K predefined by an input parameter. Both algorithms take the eigenvector matrix UK=(u1,u2,…,uK−1) with the leading K−1 eigenvectors (excluding the trivial all-ones eigenvector) of the transition matrix W~ as input. Those K−1 eigenvectors can be efficiently computed with the implicitly restarted Lanczos method (IRLM) [28]. “Algorithm Spectral-1” uses the first k−1 (2≤k≤K) columns of Uk, denoted as Uk−1, and clusters the row vectors of Uk−1 using k-means to find a k-way partition, denoted as Ck. Then, the Ck∗ with size k∗ that achieves the largest value of Q is the final community structure.
“Algorithm Spectral-2” starts with a single community (k=1) and recursively splits each community c into two smaller ones if the subdivision produces a higher value of Q. The split is done by running k-means with two clusters on the matrix Uk,c formed from Uk by keeping only rows corresponding to nodes in c. The recursive procedure stops when no more splits are possible or when k=K communities have been found and then the final community structure with the highest value of Q is the detection result.
However, the two algorithms described above, especially “Algorithm Spectral-1,” scale poorly to large networks because of running k-means partitioning up to K times. Both approaches have a worst-case complexity O(K2|V|+K|E|). In order to speed up the calculation while retaining effectiveness in approximating the maximum of Q, Ruan and Zhang [13] proposed the Kcut algorithm which recursively partitions the network to optimize Q. At each recursive step, Kcut adopts a k-way partition (k=2,3,…,l) to the subnetwork induced by the nodes and edges in each community using “Algorithm Spectral-1” of White and Smyth [12]. Then, it selects the k that achieves the highest Q. Empirically, Kcut with l as small as 3 or 4 can significantly improve Q over the standard bipartitioning method and it also reduces the computational cost to O((|V|+|E|)log|C|) for a final partition with |C| communities.
Ruan and Zhang later [14] proposed QCUT algorithm whose name is derived from modularity (Q) partitioning (CUT), that combines Kcut and local search to optimize Q. The QCUT algorithm consists of two alternating stages: 1) partitioning and 2) refinement. In the partitioning stage, Kcut is used to recursively partition the network until Q cannot be further improved. In the refinement stage, a local search strategy repeatedly considers two operations. The first one is migration that moves a node from its current community to another one and the second one is the merge of two communities into one. Both are applied to improve Q as much as possible. The partitioning stage and refinement stage are alternating until Q cannot be increased further. In order to solve the resolution limit problem of modularity, the authors proposed stading for hierarchical QCUT (HQCUT) which recursively applies QCUT to divide the subnetwork, generated with the nodes and edges in each community, into subcommunities. Further, to avoid overpartitioning, they use a statistical test to determine whether a community indeed has intrinsic subcommunities.
c) Equivalence of two categories of spectral algorithms for maximizing modularity [15]
Newman [15] showed that with hyperellipsoid relaxation, the spectral modularity maximization method using the eigenvalues and eigenvectors of the modularity matrix can be formulated as the spectral algorithm that relies on the eigenvalues and eigenvectors of Laplacian matrix. This formulation indicates that the above two kinds of modularity optimization approaches are equivalent. Starting with (5) for the division of a network into two groups, first the discreteness of si is relaxed onto a hyperellipsoid with the constraint
∑ikis2i=2|E|.(19)
View Source
\sum\limits_i {k_i}s_i^2 = 2\vert E\vert.\eqno{\hbox{(19)}}Then, the relaxed modularity maximization problem can be easily solved by setting the first derivative of (5) with respect to si to zero. This leads to
∑jBijsj=λkisi(20)
View Source
\sum\limits_j {B_{ij}}{s_j} = \lambda {k_i}{s_i}\eqno{\hbox{(20)}}or in matrix notation
Bs=λDs(21)
View Source
{\mbi {Bs}} = \lambda {\mbi {Ds}}\eqno{\hbox{(21)}}where
λ is the eigenvalue. Plugging
(20) into
(5) yields
Q=14|E|∑ijBijsisj=λ4|E|∑ikis2i=λ2.(22)
View Source
Q = {1 \over {4\vert E\vert}}\sum\limits_{ij} {B_{ij}}{s_i}{s_j} = {\lambda \over {4\vert E\vert}}\sum\limits_i {k_i}s_i^2 = {\lambda \over 2}.\eqno{\hbox{(22)}}Therefore, to achieve the highest value of Q, one should chose λ to be the largest (most positive) eigenvalue of (21). Using (6), (20) can be rewritten as
∑jAijsj=ki(λsi+12|E|∑jkjsj)(23)
View Source
\sum\limits_j {A_{ij}}{s_j} = {k_i}\left({\lambda {s_i} + {1 \over {2\vert E\vert}}\sum\limits_j {k_j}{s_j}} \right)\eqno{\hbox{(23)}}or in matrix notion as
As=D(λs+kTs2|E|1)(24)
View Source
{\mbi {As}} = {\mbi {D}}\left({\lambda {\mbi {s}} + {{{{\mbi {k}}^T}{\mbi {s}}} \over {2\vert E\vert}}{\bf 1}} \right)\eqno{\hbox{(24)}}where
k is the vector with element
ki and
1=(1,1,1,…). Then, multiplying
(24) by
1T results in
λkTs=0. If there is a nontrivial eigenvalue
λ>0, then the above equation simplifies to
As=λDs.(25)
View Source
{\mbi {As}} = \lambda {\mbi {Ds}}.\eqno{\hbox{(25)}}Again, λ should be the most positive eigenvalue. However, the eigenvector corresponding to this eigenvalue is the uniform vector 1 which fails to satisfy kTs=0. Thus, in this case, one can do the best by choosing λ to be the second largest eigenvalue and having s proportional to the corresponding eigenvector. In fact, this eigenvector is precisely equal to the leading eigenvector of (21). Then, after defining a rescaled vector u=D1/2s and plugging it into (25), we get
(D−1/2AD−1/2)u=λu.(26)
View Source
({{\mbi {D}}^{- 1/2}}{\mbi {A}}{{\mbi {D}}^{- 1/2}}){\mbi {u}} = \lambda {\mbi {u}}.\eqno{\hbox{(26)}}The matrix L=D−1/2AD−1/2 is called the normalized Laplacian matrix. (The normalized Laplacian is sometimes defined as L=I−D−1/2AD−1/2, but those two differ only by a trivial transformation of their eigenvalues and eigenvectors.)
3. Extremal Optimization
Duch and Arenas [16] proposed a modularity optimization algorithm based on the EO [29]. EO optimizes a global variable by improving extremal local variables. Here, the global variable is modularity (Q). The contribution of an individual node i to Q of the whole network with a certain community structure is given by
qi=ki,c−ki|Ec|2|E|(27)
View Source
{q_i} = {k_{i,c}} - {k_i}{{\vert{E_c}\vert} \over {2\vert E\vert}}\eqno{\hbox{(27)}}where
ki,c is the number of edges that connect node
i to the nodes in its own community
c. Note that
Q=12|E|∑iqi and
qi can be normalized into the interval
[−1,1] by dividing it by
ki λi=qiki=ki,cki−|Ec|2|E|(28)
View Source
{\lambda _i} = {{{q_i}} \over {{k_i}}} = {{{k_{i,c}}} \over {{k_i}}} - {{\vert{E_c}\vert} \over {2\vert E\vert}}\eqno{\hbox{(28)}}where
λi, called fitness, is the relative contribution of node
i to
Q. Then, the fitness of each node is adopted as the local variable.
The algorithm starts by randomly splitting the network into two partitions of equal number of nodes, where communities are the connected components in each partition. Then, at each iteration, it moves the node with the lowest fitness from its own community to another community. The shift changes the community structure, so the fitness of many other nodes needs to be recomputed. The process repeats until it cannot increase Q. After that, it generates sub-community networks by deleting the inter-community edges and proceeds recursively on each sub-community network until Q cannot be improved. Although the procedure is deterministic when given the initialization, its final result in fact depends on the initialization and it is likely to get trapped in local maxima. Thus, a probabilistic selection called τ-EO [29] in which nodes are ranked according to their fitness and a node of rank r is selected with the probability P(r)∝r−τ is used to improve the result. The computational complexity of this algorithm is O(|V|2log2|V|).
4. Simulated Annealing
SA [30] is a probabilistic procedure for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. This technique was adopted in [17]–[18][19][20] to maximize modularity (Q). The initial point for all those approaches can be arbitrary partitioning of nodes into communities, even including |V| communities in which each node belongs to its own community. At each iteration, a node i and a community c are chosen randomly. This community could be a currently existing community or an empty community introduced to increase the number of communities. Then, node i is moved from its original community to this new community c, which would change Q by ΔQ. If ΔQ is greater than zero, this update is accepted, otherwise it is accepted with probability eβΔQ where β in [17]–[18][19] represents the inverse of temperature T and β in [20] is the reciprocal of pseudo temperature τ. In addition in [20], there is one more condition for the move of a node when c is not empty, shifting node i to c is considered only if there are some edges between node i and the nodes in c. To improve the performance and to avoid getting trapped in local minima, collective movements which involve moving multiple nodes at a time [19], [20], merging two communities [17]–[18][19], and splitting a community [17]–[18][19] are employed. Splits can be carried out in a number of different schemes. The best performance is achieved by treating a community as an isolated subnetwork and partitioning it into two and then performing a nested SA on these partitions [17], [18]. Those methods stop when no new update is accepted within a fixed number of iterations.
5. Sampling Techniques
Sales-Pardo et al. [21] proposed a “box-clustering” method to extract the hierarchical organization of networks. This approach consists of two steps: 1) estimating the similarity, called “node affinity,” between nodes and forming the node affinity matrix; and 2) deriving hierarchical community structure from the affinity matrix. The affinity between two nodes is the probability that they are classified into the same community in the local maxima partitions of modularity. The set of local maxima partitions, called Pmax, includes those partitions for which neither the moving of a node from its original community to another, nor the merging of two communities will increase the value of modularity. The sample Pmax is found by performing the SA-based modularity optimization algorithm of Guimerá and Amaral [17], [18]. More specifically, the algorithm first randomly divides the nodes into communities and then performs the hill-climbing search until a sample with local maximum of modularity is reached. Then, the affinity matrix is updated based on the obtained sample.
The sample generation procedure is repeated until the affinity matrix has converged to its asymptotic value. Empirically, the total number of samples needed is proportional to the size of the network. Before proceeding to the second step, the algorithm assesses whether the network has a significant community structure or not. It is done by computing the z-score of the average modularity of the partitions in Pmax with respect to the average modularity of the partitions with the local modularity maxima of the equivalent ensemble of null model networks. The equivalent null model is obtained by randomly rewiring the edges of the original network while retaining the degree sequence. Large z-score indicates that the network has a meaningful internal community structure. If the network indeed has a significant community structure, the algorithm advances to the second step to group nodes with large affinity close to each other. The goal is to bring the form of the affinity matrix as close as possible to block-diagonal structure by minimizing the cost function representing the average distance of matrix elements to the diagonal. Then, the communities corresponds to the “best” set of boxes obtained by least-squares fitting of the block-diagonal structure to the affinity matrix. The procedure described above can be recursively performed to subnetworks induced by communities to identify the low level structure of each community until no subnetwork is found to have significant intrinsic structure.
6. Mathematical Programming
Agarwal and Kempe [22] formulated the modularity maximization problem as a linear program and vector program which has the advantage of providing a posteriori performance guarantees. First, modularity maximization can be transformed into the integer program
Maximize12|E|∑ijBij(1−xij)subject to xik≤xij+xjk,for all i,j,kxij∈{0,1},for all i,j(29)
View Source
\eqalignno{& {\hbox{Maximize} {1 \over {2\vert E\vert}}\sum\limits_{ij} {B_{ij}}(1 - {x_{ij}})} \cr& \hbox{subject to}\ {x_{ik}} \leq {x_{ij}} + {x_{jk}}, \quad \hbox{for all}\ i,j,k \cr &\qquad\qquad {x_{ij}} \in \{0,1\}, \quad \hbox{for all}\ i,j &\hbox{(29)}}where
B is the modularity matrix and the objective function is linear in the variable
xij. When
xij=0,
i and
j belong to the same community and
xij=1 indicates that they are in different communities. The restriction
xik≤xij+xjk requires that
i and
k are in the same community if and only if
i,
j, and
k are in the same community. Solving the above integer program is NP-hard, but relaxing the last constraint that
xij is a integer from
{0,1} to allow
xij be a real number in the interval [0, 1] reduces the integer program to a linear program which can be solved in polynomial time
[31]. However, the solution does not correspond to a partition when any of
xij is fractional. To get the communities from
xij, a rounding step is needed. The value of
xij is treated as the distance between
i and
j and these distances are used repeatedly to form communities of “nearby” nodes. Moreover, optimizing modularity by dividing a network into two communities can be considered as a strict quadratic program
Maximize14|E|∑ijBij(1+sisj)subject to s2i=1,for all i(30)
View Source
\eqalignno{& \hbox{Maximize} {1 \over {4\vert E\vert}}\sum\limits_{ij} {B_{ij}}(1 + {s_i}{s_j}) \cr &\hbox{subject to}\ s_i^2 = 1, \quad \hbox{for all}\ i &\hbox{(30)}}where the objective function is the same as
(5) defined by Newman
[3]. Note that the constraint
s2i=1 ensures that
si=±1 which implies that node
i belongs either to the first or the second community. Quadratic programming is NP-complete, but it could be relaxed to a vector program by replacing each variable
si with
|V|-dimensional vector
s and replacing the scalar product with the inner vector product. The solution to vector program is one location per node on the surface of a
|V|-dimensional hypersphere. To obtain a bipartition from these node locations, a rounding step is needed which chooses any random
(|V|−1)-dimensional hyperplane passing through the origin and uses this hyperplane to cut the hypersphere into two halves and as a result, separate the node vectors into two parts. Multiple random hyperplanes can be chosen and the one that gets the community structure with the highest modularity provides a solution. The same vector program is then recursively applied to subnetworks generated with nodes and edges in discovered communities to get hierarchical communities until
Q cannot be increased. Following the linear program and vector program, Agarwal and Kempe also adopted a post-processing step similar to the local search strategy proposed by Newman
[3] to further improve the results.
C. Resolution Limit
Since its inception, the modularity has been used extensively as the measure of the quality of partitions produced by community detection algorithms. In fact, if we adopt modularity as a quality measure of communities, the task of discovering communities is essentially turned into the task of finding the network partitioning with an optimal value of modularity.
However as properties of the modularity were studied, it was discovered that in some cases it fails to detect small communities. There is a certain threshold [25], such that a community of the size below it will not be detected even if it is a complete subgraph connected to the rest of the graph with a single edge. This property of modularity has been known as the resolution limit.
Although the resolution limit prevents detection of small communities, the actual value of the threshold depends on the total number of edges in the network and on the degree of interconnectedness between communities. In fact, the resolution limit can reach the values comparable to the size of the entire network causing formation of a few giant communities (or even a single community) and failing to detect smaller communities within them. It makes interpreting the results of community detection very difficult because it is impossible to tell beforehand whether a community is well-formed or if it can be further split into subcommunities.
Considering modularity as a function of the total number of edges |E| and the number of communities m makes it possible to find the values of m and |E| which maximize this function. It turns out that setting m=|E|−−−√ yields the absolute maximal value of modularity. Consequently, modularity has a resolution limit of order |E|−−−√ which bounds the number and size of communities [25]. In fact, if for a certain community, the number of edges inside it is smaller than |E|2−−−√, such community cannot be resolved through the modularity optimization. It is also possible for modularity optimization to fail to detect communities of larger size if they have more edges in common with the rest of the network. Therefore, by finding the optimal value of the modularity, we are generally not obtaining the best possible structure of communities.
The above arguments can also be applied to weighted networks. In this case, |E| is the sum of the weights of all the edges in the network, |Einci| is the sum of the weights of the edges between nodes within community ci, and |Eoutci| is the sum of the weights of the edges from the nodes in community ci to the nodes outside ci.
By introducing an additional parameter ϵ, which represents the weight of inter-community edges, Berry et al. showed in [32] that the number of communities in the optimal solution is
m=|E|ϵ−−−√.(31)
View Source
m = \sqrt {{{\vert E\vert} \over \epsilon}}.\eqno{\hbox{(31)}}Correspondingly, any community for which its size
|ci|<|E|ϵ2−−−−√−ϵ(32)
View Source
\vert{c_i}\vert \lt \sqrt {{{\vert E\vert\epsilon } \over 2}} - \epsilon \eqno{\hbox{(32)}}may not be resolved.
Introduction of ϵ brings some interesting opportunities. If we can make ϵ arbitrarily small, then we can expect maximum weighted modularity to produce any desired number of communities. In other words, given a proper weighting, a much better modularity resolution can be achieved than without weighting. However, in practice, finding a way to set edge weights to achieve small values of ϵ can be challenging. An algorithm for lowering ϵ proposed by Berry et al. requires O(m|V|log|V|) time.
D. Resolving the Resolution Limit Problem
There have been extensive studies done on how to mitigate the consequences of the modularity resolution limit. The main approaches followed are described below.
Localized modularity measure (LQ) [33] is based on the observation that the resolution limit problem is caused by modularity being a global measure since it assumes that edges between any pairs of nodes are equally likely, including connectivity between the communities. However, in many networks, the majority of communities have edges to only a few other communities, i.e., exhibit a local community connectivity.
Thus, a local version of the modularity measure for a directed network is defined as
LQ=∑ci∈C⎡⎣|Einci||Eneighbci|−(|Einci|+|Eoutci||Eneighbci|)2⎤⎦(33)
View Source
{\rm LQ} = \sum\limits_{{c_i} \in C} \left[{{{\vert {E_{{c_i}}^{{\rm in}}} \vert} \over {\vert {E_{{c_i}}^{{\rm neighb}}} \vert}} - {{\left({{{\vert {E_{{c_i}}^{{\rm in}}} \vert + \vert {E_{{c_i}}^{{\rm out}}} \vert} \over {\vert {E_{{c_i}}^{{\rm neighb}}} \vert}}} \right)}^2}} \right]\eqno{\hbox{(33)}}where
|Eneighbci| is the total number of edges in the neighboring communities of
ci, i.e., in the communities to which all neighbors of
ci belong.
Unlike traditional modularity (Q), the local version of modularity (LQ) is not bounded above by 1. The more locally connected communities a network has, the bigger its LQ can grow. In a network, where all communities are connected to each other, LQ yields the same value as Q. LQ considers individual communities and their neighbors, and therefore provides a measure of community quality that is not dependent on other parts of the network. The local connectivity approach can be applied not only to the nearest neighboring communities, but also to the second or higher neighbors as well.
Arenas et al. proposed a multiple resolution method [34] which is based on the idea that it might be possible to look at the detected community structure at different scales. From this perspective, the modularity resolution limit is not a problem but a feature. It allows choosing a desired resolution level to achieve the required granularity of the output community structure using the original definition of modularity.
The multiple resolution method is based on the definition of modularity given by (1). The modularity resolution limit depends on the total weight 2|E|. By varying the total weight, it is possible to control the resolution limit, effectively performing community detection at different granularity levels. Changing the sum of weights of edges adjacent to every node by some value r results in rescaling topology by a factor of r. Since the resolution limit is proportional to r√, the growth of the resolution limit is slower than that of r. Consequently, it would be possible to achieve a scale at which all required communities would be visible to the modularity optimization problem.
Caution should be exercised when altering the weights of edges in the network to avoid changing its topological characteristics. To ensure this, a rescaled adjacency matrix can be defined as
Ar=A+rI(34)
View Source
{\mbi {A_r}} = {\mbi {A}} + r{\mbi {I}}\eqno{\hbox{(34)}}where
A is the adjacency matrix and
I is the identity matrix. Since the original edge weights are not altered,
Ar preserves all common features of the network: distribution of sum of weights, weighted clustering coefficient, eigenvectors, etc. Essentially, introducing
r results in a self-loop of weight
r being added to every node in the network.
Optimizing the modularity for the rescaled topology Ar is performed by using the modularity at scale r as the new quality function
Qr=∑ci∈C[2|Einci|+r|ci|2|E|+r|V|−(|Eci|+r|ci|2|E|+r|V|)2](35)
View Source
{Q_r} = \sum\limits_{{c_i} \in C} \left[{{{2\vert {E_{{c_i}}^{{\rm in}}} \vert + r\vert {{c_i}} \vert} \over {2\vert E \vert + r\vert V \vert}} - {{\left({{{\vert{E_{{c_i}}}\vert + r\vert {{c_i}} \vert} \over {2\vert E \vert + r\vert V \vert}}} \right)}^2}} \right]\eqno{\hbox{(35)}}where
|ci| is the number of nodes in community
ci and
|Eci|=2|Einci|+|Eoutci|. It yields larger communities for smaller values of
r and smaller communities for larger values of
r. By performing modularity optimization for different values of
r, it is possible to analyze the community structure at different scales.
Parameter r can also be thought as representing resistance of a node to become part of a community. If r is positive, we can obtain a network community structure that is more granular than what was possible to achieve with the original definition of modularity (Q) which corresponds to r being zero. Making r negative zooms out of the network and provides a view of super communities.
Further studies of the multiple resolution approach revealed that it suffers from two major issues outlined in [35]. First, when the value of the resolution parameter r is low it tends to group together small communities. Second, when the resolution is high, it splits large communities. These trends are opposite for networks with a large variation of community sizes. Hence, it is impossible to select a value of the resolution parameter, such that neither smaller nor larger communities are adversely affected by the resolution limit. A network can be tested for susceptibility to the resolution problem by examining its clustering coefficient, i.e., a degree to which nodes tend to form communities. If the clustering coefficient has sharp changes, it indicates that communities of substantially different scales exist in this network. The result is that when the value of r is sufficiently large, bigger communities get broken up before smaller communities are found. This applies also to other multiple resolution methods and seems to be a general problem of the methods that are trying to optimize some global measure.
The hierarchical multiresolution method proposed by Granell et al. in [36] overcomes the limitations of the multiple resolution method on networks with very different scales of communities. It achieves that by introducing a new hierarchical multiresolution scheme that works even in cases of community detection near the modularity resolution limit. The main idea underlying this method is based on performing multiple resolution community detection on essential parts of the network, thus analyzing each part independently.
The method operates iteratively by first placing all nodes in a single community. Then, it finds the minimum value of the resistance parameter r which produces a community structure with the optimal value of modularity. Finally, it runs the same algorithm on each community that was found. The method terminates when no more split of communities is necessary, which usually takes just a few steps.
Another approach to leveraging the results of modularity optimization has been introduced by Chakraborty et al. in [27]. It is based on the observation that a simple change to the order of nodes in a network can significantly affect the community structure. However, a closer examination of the communities produced in different runs of a certain community detection algorithm reveals that for many networks the same invariant groups of nodes are consistently assigned to the same communities. Such groups of nodes are called constant communities. The percentage of constant communities varies depending on the network. Constant communities are detected by trying different node permutations while preserving the degree sequence of the nodes. For networks that have strong community structure, the constant communities detected can be adopted as a pre-processing step before performing modularity optimization. This can lead to higher modularity values and lower variability in results, thus improving the overall quality of community detection.
In the study [37] by Li et al., a new quantitative measure for community detection is introduced. It offers several improvements over the modularity (Q), including elimination of the resolution limit and ability to detect the number of communities. The new measure called modularity density (D) is based on the average degree of the community structure. It is given by
D=∑ci∈C2|Einci|−|Eoutci||ci|.(36)
View Source
D = \sum\limits_{{c_i} \in C} {{2\vert {E_{{c_i}}^{{\rm in}}} \vert - \vert {E_{{c_i}}^{{\rm out}}} \vert} \over {\vert {{c_i}} \vert}}.\eqno{\hbox{(36)}}The quality of the communities found is then described by the value of the modularity density (D). The larger the value of D, the stronger the community structure is.
The modularity density (D) does not divide a clique into two parts, and it can resolve most modular networks correctly. It can also detect communities of different sizes. This second property can be used to quantitatively determine the number of communities, since the maximum D value is achieved when the network is supposed to correctly partitioned. Although as mentioned in [37] finding an optimal value of modularity density (D) is NP-hard (i.e., it belongs to Non-deterministic Polynomial-time hard class of problems) but, it is equivalent to an objective function of the kernel k means clustering problem for which efficient computational algorithms are known.
Traag et al. in [38] introduce a rigorous definition of the resolution-limit-free method for which considering any induced subgraph of the original graph does not cause the detected community structure to change. In other words, if there is an optimal partitioning of a network (with respect to some objective function), and for each subgraph induced by the partitioning it is also optimal, then such objective function is called resolution-limit-free. An objective function is called additive for a certain partitioning if it is equal to the sum of the values of this objective function for each of the subgraphs induced by the partitioning.
Based on these two definitions, it is proved that if an objective function is additive and there are two optimal partitions, then any combination of these partitions is also optimal. In case of a complete graph, if an objective function is resolution-limit-free, then an optimal partitioning either contains all the nodes (i.e., there is only one community which includes all nodes) or consists of communities of size 1 (i.e., each node forms a community of its own). A more general statement for arbitrary objective functions is also true: if an objective function has local weights (i.e., weights that do not change when considering subgraphs), then it is resolution-limit-free. Although the converse is not true, there is only a relatively small number of special cases when methods with nonlocal weights are resolution-limit-free.
The authors then analyze resolution-limit-free within the framework of the first principle Potts model [39]
H=−∑ij(aijAij−bij(1−Aij))δci,cj(37)
View Source
{\cal H} = - \sum\limits_{ij} ({{a_{ij}}{A_{ij}} - {b_{ij}}({1 - {A_{ij}}})}){\delta _{{c_i},{c_j}}}\eqno{\hbox{(37)}}where
aij,
bij≥0 are some weights. The intuition behind this formula is that a community should have more edges inside it than edges which connect it to other communities. Thus, it is necessary to reward existing links inside a community and penalize links that are missing from a community. The smaller the value of
H is, the more desirable the community structure is. However, the minimal value might not be unique.
Given the definition of H, it is possible to describe various existing community detection methods with an appropriate choice of parameters, as well as propose alternative methods. The following models are shown to fit into H: Reichardt and Bornholdt (RB), Arenas, Fernándes, and Gómez (AFG), Ronhovde and Nussinov (RN) as well as the label propagation method. RB approach with a configuration null model also covers the original definition of modularity. The authors also propose a new method called constant Potts model (CPM) by choosing aij=wij−bij and bij=γ, where wij is the weight of the edge between nodes i and j, and γ is a constant. CPM is similar to RB and RN models, but is simpler and more intuitive. CPM and RN have local weights and are consequently resolution-limit-free, while RB, AFG, and modularity are not.
However, all of the above approaches are aimed at solving only the resolution limit problem. Work done by Chen et al. in [23] and [24] adopts a different definition of modularity density which simultaneously addresses two problems of modularity. It is done by mixing two additional components, Split Penalty (SP) and the community density, into the well-known definition of modularity. Community density includes internal community density and pair-wise community density. SP is the fraction of edges that connect nodes of different communities
SP=∑ci∈C⎡⎣⎢⎢⎢∑cj∈Ccj≠ci|Eci,cj|2|E|⎤⎦⎥⎥⎥.(38)
View Source
{\rm SP} = \sum\limits_{{c_i} \in C} \left[{\sum\limits_{{\scriptstyle {c_j} \in C \atop \scriptstyle {c_j} \ne {c_i}} } {{\vert{E_{{c_i},{c_j}}}\vert} \over {2\vert E\vert}}} \right].\eqno{\hbox{(38)}}The value of SP is subtracted from modularity, while the value of the community density is added to modularity and SP. Introducing SP resolves the issue of favoring small communities. Community density eliminates the problem of favoring large communities (also known as the resolution limit problem). The Modularity Density (Qds) is then given by
Qdsdcidci,cj=∑ci∈C[|Einci||E|dci−(2|Einci|+|Eoutci|2|E|dci)2−∑cj∈Ccj≠ci|Eci,cj|2|E|dci,cj⎤⎦=2|Einci||ci|(|ci|−1)=|Eci,cj||ci||cj|.(39)
View Source
\eqalignno{{Q_{ds}} &= \sum\limits_{{c_i} \in C} \left[{{\vert E_{{c_i}}^{{\rm in}}\vert} \over {\vert E\vert}}{d_{{c_i}}} - {{\left({{{2\vert E_{{c_i}}^{{\rm in}}\vert + \vert E_{{c_i}}^{{\rm out}}\vert} \over {2\vert E\vert}}{d_{{c_i}}}} \right)}^2}\right. \cr & \qquad\left. - \sum\limits_{{c_j} \in C{c_j} \ne {c_i}} {{\vert{E_{{c_i},{c_j}}}\vert} \over {2\vert E\vert}}{d_{{c_i},{c_j}}} \right] \cr {d_{{c_i}}} &= {2\vert E_{{c_i}}^{{\rm in}}\vert \over \vert{c_i}\vert(\vert{c_i}\vert - 1)}\cr {d_{{c_i},{c_j}}} &= {{\vert{E_{{c_i},{c_j}}}\vert} \over {\vert{c_i}\vert\vert{c_j}\vert}}. &\hbox{(39)}}where
dci is the internal density of community
ci and
dci,cj is the pair-wise density between community
ci and community
cj.
Modularity Density (Qds) avoids falling into the trap of merging two or more consecutive cliques in the ring of cliques network or dividing a clique into two or more parts. It can also discover communities of different sizes. Thus, using Qds solves both the resolution limit problem of modularity and the problem of splitting larger communities into smaller ones. Hence, Qds is a very effective alternative to Q.