Modularity of Complex Networks Models

  • Liudmila Ostroumova Prokhorenkova
  • Paweł Prałat
  • Andrei Raigorodskii
  • Liudmila Ostroumova Prokhorenkova
    • 1
    • 2
  • Paweł Prałat
    • 3
    • 4
  • Andrei Raigorodskii
    • 1
    • 2
    • 5
    • 6
  1. 1.Moscow Institute of Physics and TechnologyMoscowRussia
  2. 2.YandexMoscowRussia
  3. 3.Ryerson UniversityTorontoCanada
  4. 4.The Fields Institute for Research in Mathematical SciencesTorontoCanada
  5. 5.Moscow State UniversityMoscowRussia
  6. 6.Buryat State UniversityUlan-udeRussia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10088)

Abstract

Modularity is designed to measure the strength of division of a network into clusters (known also as communities). Networks with high modularity have dense connections between the vertices within clusters but sparse connections between vertices of different clusters. As a result, modularity is often used in optimization methods for detecting community structure in networks, and so it is an important graph parameter from practical point of view. Unfortunately, many existing non-spatial models of complex networks do not generate graphs with high modularity; on the other hand, spatial models naturally create clusters. We investigate this phenomenon by considering a few examples from both sub-classes. We prove precise theoretical results for the classical model of random d-regular graphs as well as the preferential attachment model, and contrast these results with the ones for the spatial preferential attachment (SPA) model that is a model for complex networks in which vertices are embedded in a metric space, and each vertex has a sphere of influence whose size increases if the vertex gains an in-link, and otherwise decreases with time.

1 Introduction and Definitions

Many social, biological, and information systems can be represented by networks, whose vertices are items and links are relations between these items [2, 4, 6, 12]. That is why the evolution of complex networks attracted a lot of attention in recent years and there has been a great deal of interest in modeling of these networks [9, 17, 30]. The hyperlinked structure of the Web, citation patterns, friendship relationships, infectious disease spread, these are seemingly disparate linked data sets which have fundamentally very similar natures. Indeed, it turns out that many real-world networks have some typical properties: power-law degree distribution, small diameter, high clustering coefficient, and others [27, 29, 33]. Such properties are well-studied both in real-world networks and in many theoretical models.

Another important property of complex networks is their community structure, that is, the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters [14, 18]. In social networks communities may represent groups by interest, in citation networks they correspond to related papers, in the Web communities are formed by pages on related topics, etc. Being able to identify communities in a network could help us to exploit this network more effectively. For example, clusters in citation graphs may help to find similar scientific papers, discovering users with similar interests is important for targeted advertisement, clustering can also be used for network compression and visualization.

The key ingredient for many clustering algorithms is modularity, which is at the same time a global criterion to define communities, a quality function of community detection algorithms, and a way to measure the presence of community structure in a network. Modularity was introduced by Newman and Girvan [31] and it is based on the comparison between the actual density of edges inside a community and the density one would expect to have if the vertices of the graph were attached at random, regardless of community structure.

Unfortunately, modularity is not a well studied parameter for the existing random graph models, at least from a rigorous, theoretical point of view. We are only aware about results for binomial random graphs G(np) and random d-regular graphs (see Sect. 2.3 for more details). In this paper, we continue investigating random d-regular graphs and obtain new upper bounds for their modularity. Then we move to the preferential attachment model, introduced by Barabási and Albert [5], which is probably the most well-studied model of complex networks. For this model no results on modularity are known and we obtain some preliminary results, both lower and upper bounds, and will investigate this model more in the journal version of this paper. In fact, the lower bound we present holds for all graphs with average degree d and sublinear maximum degree.

As expected, the models discussed above, as well as many others, have a common weakness of low modularity. One family of models which overcomes this deficiency is the family of spatial (or geometric) models, wherein the vertices are embedded in a metric space such that similar vertices are closer to each other than dissimilar ones. The underlying geometry of spatial models naturally leads to the emergence of clusters. We prove this statement rigorously for one example of a geometric model, the Spatial Preferential Attachment model introduced in [1].

The paper is structured as follows. In the next section, we formally define modularity, discuss several random graph models and present known results on modularity in these models. In Sects. 3, 4 and 5 we analyze modularity in random d-regular graphs, preferential attachment and SPA models, respectively.

Due to the space limitations, proofs of our results are omitted in this short proceeding version but will be included in the longer journal one.

2 Preliminaries

2.1 Modularity

The definition of modularity was first introduced by Newman and Girvan in [31]. Since then, many popular and applied algorithms used to find clusters in large data-sets are based on finding partitions with high modularity [16, 22, 28]. The modularity function favors partitions in which a large proportion of the edges falls entirely within the parts and biases against having too few or too unequally sized parts. Formally, for a given partition \(\mathcal{{A}}= \{A_1, \ldots , A_k\}\) of the vertex set V(G), let
$$\begin{aligned} q_{\mathcal{{A}}} = \sum _{A \in \mathcal{{A}}} \left( \frac{e(A)}{|E(G)|} - \frac{ (\sum _{v \in A} \deg (v))^2 }{4|E(G)|^2} \right) , \end{aligned}$$
(1)
where \(e(A) = |\{ uv \in E(G) : u,v \in A\}|\) is the number of edges in the graph induced by the set A. The first term, \(\sum _{A \in \mathcal{{A}}} \frac{e(A)}{|E(G)|}\), is called the edge contribution, whereas the second one, \(\sum _{A \in \mathcal{{A}}} \frac{ (\sum _{v \in A} \deg (v))^2 }{4|E(G)|^2}\), is called the degree tax. It is easy to see that \(q_{\mathcal{{A}}}\) is always smaller than one. Also, if \(\mathcal{{A}}= \{V(G)\}\), then \(q_{\mathcal{{A}}} = 0\).
The modularity\(q^*(G)\) is defined as the maximum of \(q_{\mathcal{{A}}}\) over all possible partitions \(\mathcal{{A}}\) of V(G); that is,
$$ q^*(G) = \max _{\mathcal{{A}}} q_{\mathcal{{A}}} (G). $$
In order to maximize \(q_{\mathcal{{A}}} (G)\) one wants to find a partition with large edge contribution subject to small degree tax. If \(q^*(G)\) approaches 1 (which is the maximum possible value), we observe a strong community structure; conversely, if \(q^*(G)\) is close to zero, we are given a graph with no community structure.

2.2 Random Graph Models

Classical Models. Let us recall two classical models of random graphs that are extensively studied in the literature. The binomial random graph\({\mathcal {G}}(n,p)\) is the random graph G with the vertex set \([n] := \{ 1, 2, \ldots , n\}\) in which every pair \(\{i,j\} \in \left( {\begin{array}{c}[n]\\ 2\end{array}}\right) \) appears independently as an edge in G with probability p. Note that \(p=p(n)\) may (and usually does) tend to zero as n tends to infinity.

However, in this paper we concentrate on another probability space, the probability space of random d-regular graphs with uniform probability distribution. This space is denoted \(\mathcal {G}_{n,d}\), and asymptotics are for \(n\rightarrow \infty \) with \(d\ge 2\) fixed, and n even if d is odd.

We say that an event in a probability space holds asymptotically almost surely (or a.a.s.) if the probability that it holds tends to 1 as n goes to infinity. Since we aim for results that hold a.a.s., we will always assume that n is large enough.

Preferential Attachment. The Preferential Attachment (PA) model, introduced by Barabási and Albert [5], was an early stochastic model of complex networks. We will use the following precise definition of the model, as considered by Bollobás and Riordan in [10] as well as Bollobás et al. [11].

Let \(G_1^0\) be the null graph with no vertices (or let \(G_1^1\) be the graph with one vertex, \(v_1\), and one loop). The random graph process \((G_1^t)_{t \ge 0}\) is defined inductively as follows. Given \(G_1^{t-1}\), we form \(G_1^t\) by adding a vertex \(v_t\) together with a single edge between \(v_t\) and \(v_i\), where i is selected randomly with the following probability distribution:
$$ {\mathbb P}(i = s) = {\left\{ \begin{array}{ll} \deg (v_s,t-1) / (2t-1) &{} 1 \le s \le t-1, \\ 1/(2t-1) &{} s=t, \end{array}\right. } $$
where \(\deg (v_s,t-1)\) denotes the degree of \(v_s\) in \(G_1^{t-1}\). In other words, we send an edge e from \(v_t\) to a random vertex \(v_i\), where the probability that a vertex is chosen is proportional to its degree at the time, counting e as already contributing one to the degree of \(v_t\).

For \(m \in \mathbb N \setminus \{1\}\), the process \((G_m^t)_{t \ge 0}\) is defined similarly with the only difference that m edges are added to \(G_m^{t-1}\) to form \(G_m^t\) (one at a time), counting previous edges as already contributing to the degree distribution. Equivalently, one can define the process \((G_m^t)_{t \ge 0}\) by considering the process \((G_1^t)_{t \ge 0}\) on a sequence \(v'_1, v'_2, \ldots \) of vertices; the graph \(G_m^t\) is formed from \(G_1^{tm}\) by identifying vertices \(v'_1, v'_2, \ldots , v'_m\) to form \(v_1\), identifying vertices \(v'_{m+1}, v'_{m+2}, \ldots , v'_{2m}\) to form \(v_2\), and so on. Note that in this model \(G_m^t\) is in general a multigraph, possibly with multiple edges between two vertices (if \(m\ge 2\)) and self-loops.

It was shown in [11] that for any \(m \in \mathbb N \) a.a.s. the degree distribution of \(G_m^n\) follows a power law: the number of vertices with degree at least k falls off as \((1+o(1)) ck^{-2} n\) for some explicit constant \(c=c(m)\) and large \(k \le n^{1/15}\). Also, in the case \(m=1\), \(G_1^n\) is a forest. Each vertex sends an edge either to itself or to an earlier vertex, so the graph consists of components which are trees, each with a loop attached. The expected number of components is then \(\sum _{t=1}^n 1/(2t-1) \sim (1/2) \log n\) and, since events are independent, we derive that a.a.s. there are \((1/2+o(1)) \log n\) components in \(G_1^n\) by Chernoff’s bound. Moreover, Pittel [32] essentially showed that a.a.s. the largest distance between two vertices in the same component of \(G_1^n\) is \((\gamma ^{-1} + o(1)) \log n\), where \(\gamma \) is the solution of \(\gamma e^{1+\gamma } = 1\). In contrast, for the case \(m \ge 2\) it is known that a.a.s. \(G_m^n\) is connected and its diameter is \((1+o(1)) \log n / \log \log n\) [10].

Spatial Preferential Attachment. The Spatial Preferential Attachment (SPA) model [1], designed as a model for the World Wide Web, combines geometry and preferential attachment, as its name suggests. Setting the SPA model apart is the incorporation of ‘spheres of influence’ to accomplish preferential attachment: the greater the degree of a vertex, the larger its sphere of influence, and hence the higher the likelihood of the vertex gaining more neighbors.

We now give a precise description of the SPA model. Let \(S = [0,1]^m\) be the unit hypercube in \(\mathbb R ^m\), equipped with the torus metric derived from any of the \(L_p\) norms. This means that for any two points x and y in S,
$$ d(x,y)=\min \big \{ ||x-y+u||_p\,:\,u\in \{-1,0,1\}^m \big \}. $$
The torus metric thus ‘wraps around’ the boundaries of the unit square; this metric was chosen to eliminate boundary effects. The parameters of the model consist of the link probability\(p\in [0,1]\), and two positive constants \(A_1\) and \(A_2\), which, in order to avoid the resulting graph becoming too dense, must be chosen so that \(pA_1 < 1\). The SPA model generates stochastic sequences of directed graphs \((G_{t}:t\ge 0)\), where \(G_{t}=(V_{t},E_{t})\), and \(V_{t}\subseteq S\). Let \(\deg ^{-}(v,t)\) be the in-degree of the vertex v in \( G_{t}\), and \(\deg ^+(v,t)\) its out-degree. We define the sphere of influenceS(vt) of the vertex v at time \(t\ge 1\) to be the ball centered at v with volume |S(vt)| defined as follows:
$$\begin{aligned} |S(v,t)|=\frac{A_1{\deg }^{-}(v,t)+A_2}{t}, \end{aligned}$$
(2)
or \(S(v,t)=S\) and \(|S(v,t)|=1\) if the right-hand-side of (2) is greater than 1.

The process begins at \(t=0\), with \(G_0\) being the null graph. Time-step t, \(t\ge 1\), is defined to be the transition between \(G_{t-1}\) and \(G_t\). At the beginning of each time-step t, a new vertex \(v_t\) is chosen uniformly at random from S, and added to \(V_{t-1}\) to create \(V_{t}\). Next, independently, for each vertex \(u\in V_{t-1}\) such that \(v_t \in S(u,t-1)\), a directed link \((v_{t},u)\) is created with probability p. Thus, the probability that a link \((v_t,u)\) is added in time-step t equals \(p\,|S(u,t-1)|\).

The SPA model produces scale-free networks, which exhibit many of the characteristics of real-life networks (see [1, 13]). In [19], it was shown that the SPA model gave the best fit, in terms of graph structure, for a series of social networks derived from Facebook. In [20], some properties of common neighbors were used to explore the underlying geometry of the SPA model and quantify vertex similarity based on distance in the space. However, the distribution of vertices in space was assumed to be uniform [20] and so in [21] non-uniform distributions were investigated which is clearly a more realistic setting.

2.3 Previous Results on Modularity

In this section we discuss known bounds for modularity in different random graph models.

The isoperimetric number of a graph G is defined as
$$ i(G) = \min _{V(G) = V_1 \cup V_2} \frac{e(V_1,V_2)}{ \min \{|V_1|,V_2|\}}, $$
where \(e(V_1,V_2) = |\{ uv \in E(G) : u \in V_1, v \in V_2\}|\) is the number of edges between the sets \(V_1\) and \(V_2\). The following result was shown by McDiarmid and Skerman in [23]. Let G be any d-regular graph on n vertices. As mentioned in [23], the following useful upper bound on the modularity is almost immediate:
$$\begin{aligned} q^*(G) \le \max \{1-i(G)/d, 3/4\}. \end{aligned}$$
(3)
Turning to random d-regular graphs, Bollobás in [8] showed that a.a.s. \(i({\mathcal {G}}_{n,d}) \ge (1-\eta )d/2\), where \(0< \eta < 1\) is such that \(2^{4/d} < (1-\eta )^{1-\eta } (1+\eta )^{1+\eta }\) and so a.a.s.
$$ q^*({\mathcal {G}}_{n,d}) \le U_1 = U_1(d) := \max \{1/2+\eta /2, 3/4\}. $$
As a result, we get the first non-trivial upper bounds for \(q^*({\mathcal {G}}_{n,d})\) presented in Table 1 that hold a.a.s.
In [23], the bound (3) was slightly improved when the maximum size of parts in our partition is restricted. Formally, given \(\delta > 0\), for a graph G with \(n \ge 1/\delta \) vertices, they define \(q_{\delta }(G)\) to be the maximum modularity of all partitions for G such that each part has size at most \(\delta n\). They show that for any \(\varepsilon > 0\) there exists \(\delta > 0\) such that any d-regular graph with at least \(1/\delta \) vertices satisfies
$$ q_{\delta }(G) \le 1-2i(G)/d + \varepsilon . $$
Again, using the result of Bollobás we get that there exists \(\delta > 0\) such that
$$ U_2 = U_2(d) := \eta + \varepsilon $$
serves as an upper bound that holds a.a.s. for \(q_{\delta }({\mathcal {G}}_{n,d})\); again, see Table 1 for numerical values for small values of d. It is straightforward to see that \(i(G) \ge d/2 - \sqrt{ (\log 2) d }\) (see, for example, [8]) and so, in particular, \(U_2\) can be made arbitrarily small by taking d large enough (and \(\delta \) small enough). However, let us note that these upper bounds for \(q_{\delta }({\mathcal {G}}_{n,d})\), while useful, cannot be directly translated into any bound for \(q^*({\mathcal {G}}_{n,d})\).
Table 1.

Upper bounds for \(q^*({\mathcal {G}}_{n,d})\) and for \(q_{\delta }({\mathcal {G}}_{n,d})\) (\(U_2\))

d

\(U_1\)

\(U_2\)

\(U_3\)

3

0.9386

0.8771

0.8038

4

0.8900

0.7800

0.6834

5

0.8539

0.7078

0.6024

6

0.8261

0.6521

0.5435

7

0.8038

0.6076

0.4984

8

0.7855

0.5710

0.4624

9

0.7702

0.5403

0.4330

10

0.7570

0.5140

0.4083

Investigating random d-regular graphs continues in [24], a very recent paper. In fact, some of our results for this model mentioned below are obtained independently there. Moreover, they investigate the class of graphs whose product of treewidth and maximum degree is much less than the number of edges. This shows, for example, that random planar graphs typically have modularity close to 1, which is another indication that clusters naturally emerge where geometry is included.

3 Random d-regular Graphs

3.1 Lower Bound

For completeness, let us briefly discuss the following known lower bound for the modularity of \({\mathcal {G}}_{n,d}\). It is known that a.a.s. for any \(d \in \mathbb N \setminus \{1,2\}\), \({\mathcal {G}}_{n,d}\) is Hamiltonian. As pointed out in [23], one can use this fact to partition the graph such that it breaks the cycle into \(\lceil \sqrt{n} \rceil \) paths of length at most \(\lceil \sqrt{n} \rceil \). For this particular partition the edge contribution is \(2/d - O(1/\sqrt{n})\) and the degree tax is \(O(1/\sqrt{n})\). It follows then that a.a.s.
$$ q^*({\mathcal {G}}_{n,d}) \ge \frac{2}{d} - O(1/\sqrt{n}) = \frac{2+o(1)}{d}. $$
(Our more general lower bound that holds for graphs with average degree d implies the same—see Theorem 4 for more.) Whereas this trivial lower bound could be sharp for \(d=3\) it is definitely not the case for large d. As pointed out in [24], there exists a universal constant \(c>0\) such that a.a.s. \(q^*({\mathcal {G}}_{n,d}) \ge c/\sqrt{d}\).

3.2 Slightly Improved, Numerical Upper Bound

Let us consider the following, natural, approach that already improves slightly an upper bound for \(q^*({\mathcal {G}}_{n,d})\). Consider any d-regular graph with n vertices. For a given partition \(\mathcal{{A}}= \{A_1, \ldots , A_k\}\) of the vertex set V(G), let \(x_i = |A_i|/n\) and \(y_i = 2 |E(A_i)|/|A_i|\); that is, set \(A_i\) has \(x_in\) vertices and induces \(y_i x_i n / 2\) edges. Then (1) can be simplified to
$$\begin{aligned} q_{\mathcal{{A}}} = \sum _{i=1}^k x_i \left( \frac{y_i}{d} - x_i \right) . \end{aligned}$$
(4)
As it is simply a weighted average, \(q_{\mathcal{{A}}} \ge U\) would imply that there exists some set of size xn that induces yxn / 2 edges, and \(y/d-x \ge U\). Using the pairing model [7], we will show that a.a.s. it is not the case (for some carefully chosen \(U=U(d)\)) and, as a result, it will yield an upper bound for \(q^*({\mathcal {G}}_{n,d})\) that holds a.a.s.
For a given \(d \in \mathbb N \setminus \{1,2\}\), let
$$\begin{aligned} f(x,&y,d) := x(y/2-1) \log (x) + (1-x)(d-1)\log (1-x) + d \log (d) / 2 \\&- xy \log (y) / 2 - x(d-y) \log (d-y) - (d-2xd+xy) \log (d-2xd+xy) /2. \nonumber \end{aligned}$$
(5)
It will be clear once we establish the connection between the function f and random d-regular graphs, but it is straightforward to see that for any \(x \in (0,1)\) we have \(f(x,d,d) < 0\) and \(f(x,y,d) > 0\) for some \(y \in (0,d)\). Indeed, for example note that for a fixed \(x \in (0,1/2]\), f(xyd) is strictly concave in \(y \in (0,d)\) as
$$ \frac{d^2 f(x,y,d)}{dy^2} = \frac{-(d(1-2x)+y)dx}{2(d(1-2x)+xy)(d-y)y} < 0. $$
Let \(y_3 = y_3(x,d)\) be the largest value of \(y \in (0,1)\) such that \(f(x,y,d)=0\); in particular, \(f(x,y,d)\le 0\) for any \(y \in (y_3, d)\). Finally, let
$$ U_3 = U_3(d) := \sup _{x \in (0,1)} \left( \frac{y_3(x,d)}{d} - x \right) . $$
As usual, see Table 1 for numerical values for small values of d. The promised upper bound follows immediately from the following theorem.

Theorem 1

Let \(d \in \mathbb N \setminus \{1, 2\}\) and \(\varepsilon >0\) be an arbitrarily small constant. The following property holds a.a.s. for \({\mathcal {G}}_{n,d}\). No set A of size xn (for any \(x=x(n) \in (0,1)\)) induces a graph with yxn / 2 edges, where \(y_3(x,d) + \varepsilon \le y \le d\) and \(y_3(x,d)\) is defined as above. In particular, this implies that
$$ q^*({\mathcal {G}}_{n,d}) \le U_3 + \varepsilon / d, $$
where \(U_3 = U_3(d)\) is defined as above.

3.3 Explicit but Weaker Upper Bound

Theorem 1 provides an upper bound that can be easily numerically computed for a given \(d \in \mathbb N \setminus \{1,2\}\). Next, we present a slightly weaker but an explicit bound that can be obtained using the expansion properties of random d-regular graphs that follow from their eigenvalues. In particular, it will imply that a.a.s. \(q^*({\mathcal {G}}_{n,d}) = O(1/\sqrt{d})\) and so \(q^*({\mathcal {G}}_{n,d}) \rightarrow 0\) as \(d \rightarrow \infty \).

Theorem 2

Let \(d \in \mathbb N \setminus \{1, 2\}\) and \(\varepsilon >0\) be an arbitrarily small constant. The following property holds a.a.s. for \({\mathcal {G}}_{n,d}\). No set A of size xn induces a graph with more than yxn / 2 edges, where \(y = dx + \lambda (1-x)\). In particular, this implies that a.a.s.
$$ q^*({\mathcal {G}}_{n,d}) \le \frac{\lambda }{d} \le \frac{ 2 \sqrt{d-1}+ \varepsilon }{d} \le \frac{2}{\sqrt{d}}. $$

4 The Preferential Attachment Model

4.1 Constant Average Degree Graphs

In order to obtain a lower bound for modularity of Preferential Attachment graphs, we first analyze graphs with constant average degree in general. In this section, we extend the results of [26] and we start with the analysis of trees. It was proven in [26] that trees with maximum degree \(\varDelta = o(\root 5 \of {n})\) have asymptotic modularity 1. We generalize this result in two ways: first, we relax the condition on maximum degree; second, we allow our graphs to be disconnected, that is, we consider forests instead of trees. We prove the following theorem.

Theorem 3

Let \(\{F_n \}\) be a sequence of forests, \(F_n\) is a forest on n vertices with no isolated ones and \(\varDelta = \varDelta (F_n) = o(n)\). Then \(q^*(F_n) \ge 1 - O\left( \sqrt{\frac{\varDelta }{n}}\right) = 1-o(1)\) as \(n \rightarrow \infty \).

Note that it is also known that the asymptotic modularity of trees with maximum degree \(\varDelta = \Omega (n)\) is strictly less than 1 [26]. Hence, the assumption \(\varDelta = o(n)\) cannot be eliminated. Now we can use the previous theorem to get the following result for graphs of bounded average degree.

Theorem 4

Let \(\{G_n\}\) be a sequence graphs, \(G_n\) is a connected graph on n vertices with the average degree \(\frac{2|E(G_n)|}{n} \le D\) for some constant D, and \(\varDelta = \varDelta (G_n) = o(n)\). Then \(q^*(G_n) \ge \frac{2}{D} - O\left( \sqrt{\frac{\varDelta }{n}}\right) = \frac{2}{D}-o(1)\).

4.2 Lower Bound

The following theorem easily follows from the above result.

Theorem 5

For any \(\varepsilon >0\) a.a.s.
$$ q^*(G_m^n) \ge \frac{1}{m} - O\left( n^{-1/4+\varepsilon }\right) = \frac{1}{m} - o(1). $$

As in the case of random d-regular graphs, it is natural to conjecture that the above lower bound is not sharp. Let \(c \in (0,1)\) and consider the following partition: \(A_1 = \{v_1, \ldots , v_{cn} \}\), \(A_2 = V(G_m^n) \setminus A_1 = \{ v_{cn+1}, \ldots , v_n\}\). Using martingales, it is possible to show that a.a.s. \(\sum _{v \in A_1} \deg (v, n) \sim 2mn\sqrt{c}\) (and so \(\sum _{v \in A_2} \deg (v, n) \sim 2mn(1-\sqrt{c})\)). Clearly, \(e(A_1) = mcn\) and so a.a.s. \(e(A_1,A_2) \sim 2mn(\sqrt{c}-c)\) and \(e(A_2) \sim mn(1+c-2\sqrt{c})\). The edge contribution and the degree tax are then both asymptotic to \(1+2c-2\sqrt{c}\). Not surprisingly, such partition cannot be used to get a non-trivial lower bound for the modularity but, similarly to the situation for random d-regular graphs, we may try to use it as a starting point to get slightly better partition. The basic idea is very simple: one can start with a given partition (or partition the vertices randomly into two classes), and if a vertex has more neighbors in the other class than in its own, then we randomly decide whether to shift it to the other class or leave it where it is. This approach proved to be useful to get a bound for the bisection width in random d-regular graphs [3] which, in turn, yields a lower bound for the modularity [24]. We plan to investigate it further in the journal version of this paper.

4.3 Upper Bound

The edge expansion\(\rho =\rho (G)\) of a graph G is defined as follows:
$$ \rho = \min _{S \subset V(G), |S| \le |V|/2} \frac{e(S,V \setminus S)}{|S|}. $$
In [25] it was shown that a.a.s. \(\rho (G^n_m) \ge \alpha \), provided that \(2(m-1)-4\alpha -1>0\). In other words, for any \(\varepsilon >0\) we have that a.a.s.
$$ \rho (G^n_m) \ge \frac{m}{2} - \frac{3+\varepsilon }{4}. $$
Using this observation one can easily obtain a non-trivial upper bound for \(q^*(G_m^n)\).
Let \(\varepsilon > 0\) be an arbitrary small constant. Consider any partition \(\mathcal{{A}}= \{A_1, \ldots , A_k\}\) of the vertex set \(V(G_m^n)\). If \(|A_i| > n/2\) for some i, then the degree tax is at least
$$ \frac{ (\sum _{v \in A_i} \deg (v, n))^2}{4 |E(G_m^n)|} \ge \frac{(m|A_i|)^2}{4(mn)^2} = \frac{1}{16}. $$
On the other hand, if \(|A_i| \le n/2\) for all i, then a.a.s. the number of edges between parts is equal to
$$ \frac{1}{2} \sum _{i=1}^k e(A_i, V \setminus A_i) \ge \frac{1}{2} \sum _{i=1}^k \rho |A_i| = \frac{\rho n}{2} \ge \left( \frac{m}{4} - \frac{3+\varepsilon }{8} \right) n, $$
and so the edge contribution is a.a.s. at most
$$ 1 - \left( \frac{1}{4} - \frac{3+\varepsilon }{8m} \right) = \frac{3}{4} + \frac{3+\varepsilon }{8m} \le \frac{15+\varepsilon }{16}, $$
for any \(m \ge 2\). The following result holds.

Theorem 6

For any \(\varepsilon >0\) a.a.s.
$$ q^*(G_2^n) \le \frac{15+\varepsilon }{16}. $$
Moreover, for any \(m \ge 3\) a.a.s.
$$ q^*(G_m^n) \le \frac{15}{16}. $$

Much stronger expansion property was recently obtained in [15]. We are currently working on using this property to obtain general upper bound for \(q^*(G_m^n)\) that holds for any integer m as well as specific stronger bounds for small values of m. Details will be provided in the journal version of this paper.

5 The Spatial Preferential Attachment Model

Consider \(G_{n}=(V_{n},E_{n})\), a graph generated by the SPA model. As the modularity is defined for undirected graphs, we consider \(\hat{G}_n\) that is a graph obtained from \(G_n\) by replacing each directed edge (uv) by undirected edge uv. (As edges in \(G_n\) are always from ‘younger’ to ‘older’ vertices, there is no problem with generating multigraph; \(\hat{G}_n\) is a simple graph.) Let us recall that \(V_n \subseteq S\) where S is the unit hypercube \([0,1]^m\). We use the geometry of the model to obtain a suitable partition that yields high modularity of \(G_n\). The following properties (proved many times; see, for example, [1, 13]) are the only properties of the model that are used in the proof: a.a.s. for every pair it such that \(1 \le i \le t \le n\) we have that
$$\begin{aligned} \deg ^-(v_i, t)= & {} O \Big ( (t/i)^{pA_1} \log ^2 n \Big ), \end{aligned}$$
(6)
$$\begin{aligned} \deg ^+(v_i, t)= & {} O \Big ( \log ^2 n \Big ), \end{aligned}$$
(7)
and \(|E(G_n)| = \varTheta (n)\). Now, we are ready to state our result for the SPA model.

Theorem 7

Let \(p\in (0,1]\), \(A_1, A_2 >0\), and suppose that \(pA_1 < 1\). Then, the following holds a.a.s.:
$$ q^*(\hat{G}_n) = 1 - O \left( n^{\max \{-1/m,-1+pA_1\} / 2} \log ^{9/2} n \right) = 1 - o(1). $$

Acknowledgements

This work is supported by Russian Science Foundation (grant number 16-11-10014), NSERC, The Tutte Institute for Mathematics and Computing, and Ryerson University.

Copyright information

© Springer International Publishing AG 2016

Personalised recommendations