New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Network dismantling
Edited by Giorgio Parisi, University of Rome, Rome, Italy, and approved September 16, 2016 (received for review March 29, 2016)

Significance
Many systems of interest can be represented by a network of nodes connected by edges. In many circumstances, the existence of a giant component is necessary for the network to fulfill its function. Motivated by the need to understand optimal attack strategies, optimal spread of information, or immunization policies, we study the network dismantling problem (i.e., the search for a minimal set of nodes in which removal leaves the network broken into components of subextensive size). We give the size of the optimal dismantling set for random networks, propose an efficient dismantling algorithm for general networks that outperforms by a large margin existing strategies, and provide various insights about the problem.
Abstract
We study the network dismantling problem, which consists of determining a minimal set of vertices in which removal leaves the network broken into connected components of subextensive size. For a large class of random graphs, this problem is tightly connected to the decycling problem (the removal of vertices, leaving the graph acyclic). Exploiting this connection and recent works on epidemic spreading, we present precise predictions for the minimal size of a dismantling set in a large random graph with a prescribed (light-tailed) degree distribution. Building on the statistical mechanics perspective, we propose a three-stage Min-Sum algorithm for efficiently dismantling networks, including heavy-tailed ones for which the dismantling and decycling problems are not equivalent. We also provide additional insights into the dismantling problem, concluding that it is an intrinsically collective problem and that optimal dismantling sets cannot be viewed as a collection of individually well-performing nodes.
A network (a graph G in the discrete mathematics language) is a set V of N entities called nodes (or vertices), along with a set E of edges connecting some pairs of nodes. In a simplified way, networks are used to describe numerous systems in very diverse fields, ranging from social sciences to information technology or biological systems (reviews are in refs. 1 and 2). Several crucial questions in the context of network studies concern the modifications of the properties of a graph when a subset S of its nodes is selected and treated in a specific way. For instance, how much does the size of the largest connected component of the graph decrease if the vertices in S (along with their adjacent edges) are removed? Do the cycles survive this removal? What is the outcome of the epidemic spreading if the vertices in S are initially contaminated, constituting the seed of the epidemic? On the contrary, what is the influence of a vaccination of nodes in S preventing them from transmitting the epidemic? It is relatively easy to answer these questions when the set S is chosen randomly, with each vertex being selected with some probability independently. Classical percolation theory is nothing but the study of the connected components of a graph in which some vertices have been removed in this way.
A much more interesting case is when the set S can be chosen in some optimal way. Indeed, in all applications sketched above, it is reasonable to assign some cost to the inclusion of a vertex in S: vaccination has a socioeconomic price, incentives must be paid to customers to convince them to adopt a new product in a viral marketing campaign, and incapacitating a computer during a cyber attack requires resources. Thus, one faces a combinatorial optimization problem: the minimization of the cost of S under a constraint on its effect on the graph. These problems thus exhibit both static and dynamic features, the former referring to the combinatorial optimization aspect and the latter referring to the definition of the cost function itself through a dynamical process.
In this paper, we focus on the existence of a giant component in a network: that is, the largest component containing a positive fraction of the vertices (in the N→∞ limit). On the one hand, the existence of a giant component is often necessary for the network to fulfill its function (e.g., to deliver electricity or information bits or ensure possibility of transportation). An adversary might be able to destroy a set of nodes with the goal of destroying this functionality. It is thus important to understand what an optimal attack strategy is, possibly as a first step in the design of optimal defense strategies. On the other hand, a giant component can propagate an epidemic to a large fraction of a population of nodes. Interpreting the removal of nodes as the vaccination of individuals who cannot transmit the epidemic anymore, destroying the giant component can be seen as an extreme way of organizing a vaccination campaign (3, 4) by confining the contagion to small connected components [less drastic strategies can be devised using specific information about the epidemic propagation model (5, 6)]. Another related application is influence maximization as studied in many previous works (7⇓–9). In particular, optimal destruction of the giant component is equivalent to selection of the smallest set of initially informed nodes needed to spread the information into the whole network under a special case of the commonly considered model for information spreading (7⇓–9).
To define the main subject of this paper more formally, following ref. 10, we call S a C-dismantling set if its removal yields a graph with the largest component that has size (in terms of its number of nodes) at most C. The C-dismantling number of a graph is the minimal size of such a set. When the value of C is either clear from the context or not important for the given claim, we will simply talk about dismantling. Typically, the size of the largest component is a finite fraction of the total number of nodes N. To formalize the notion of destroying the giant component, we will consider the bound C on the size of the connected components of the dismantled network to be such that 1≪C≪N. It should be noted that we defined dismantling in terms of node removal; it could be rephrased in terms of edge removal (11), which turns out to be a much easier problem. The dismantling problem is also referred to as fragmentability of graphs in graph theory literature (12⇓–14) and optimal percolation in ref. 15.
Determining whether the C-dismantling number of a graph is smaller than some constant is nondeterministic polynomial (NP)-complete decision problem (a proof is in SI Appendix). The concept of NP completeness concerns the worst case difficulty of the problem. The questions that we address in this paper are instead the following. What is the dismantling number on some representative class of graphs (in our case, random graphs)? What are the best heuristic algorithms, how does their performance compare with the optimum, and how do they perform on benchmarks of real world graphs? Simple heuristic algorithms for the dismantling problem were considered in previous works (16⇓–18), where the choice of the nodes to be included in the dismantling set was based on their degrees (favoring the inclusion of the most connected vertices) or some measure of their centrality. More recently, a heuristic for the dismantling problem has been presented in ref. 15 under the name “collective” influence (CI), in which the inclusion of a node is decided according to a combination of its degree and the degrees of the nodes in a local neighborhood around it. Ref. 15 also attempts to estimate the dismantling number on random graphs.
Our Main Contribution
In this paper, we provide a detailed study of the dismantling problem, with both analytical and algorithmic outcomes. We present very accurate estimates of the dismantling number for large random networks, building on a connection with the decycling problem [in which one seeks a subset of nodes with removal that leaves the graph acyclic; also an NP-complete problem (19)] and recent studies of optimal spreading (20⇓⇓–23). Our results are the one-step replica symmetry broken estimate of the ground state of the corresponding optimization problem.
On the computational side, we introduce a very efficient algorithm that outperforms considerably state of the art algorithms for solving the dismantling problem. We show its efficiency and closeness to optimality on both random graphs and real world networks. The goal of our paper is closely related to the one of ref. 15; we present an assessment of the results reported therein on random as well as real world networks.
Our dismantling algorithm, which has been inspired by the theoretical insight gained on random graphs, is composed of three stages.
i) Min-Sum message passing for decycling. This part is the core of the algorithm, using a variant of a message-passing algorithm developed in refs. 20 and 21. A related but different message-passing algorithm was developed for decycling in ref. 22 and later applied to dismantling in ref. 24; it performs comparably with ours.
ii) Tree breaking. After all cycles are broken, some of the tree components may still be larger than the desired threshold C. We break them into small components, removing a fraction of nodes that vanishes in the large size limit. This operation can be done in time O(NlogN) by an efficient greedy procedure (detailed in SI Appendix).
iii) Greedy reintroduction of cycles. As explained below, the strategy of first decycling a graph before dismantling it is the optimal one for graphs that contain few short cycles (a typical property of light-tailed random graphs). For graphs with many short cycles, we improve considerably the efficiency of our algorithm by reinserting greedily some nodes that close cycles without increasing too much the size of the largest component.
The dismantling problem, as is often the case in combinatorial optimization, exhibits a very large number of (quasi)optimal solutions. We characterize the diversity of these degenerate minimal dismantling sets by a detailed statistical analysis, computing in particular the frequency of appearance of each node in the quasioptimal solutions, and conclude that dismantling is an intrinsically collective phenomenon that results from a correlated choice of a finite fraction of nodes. It thus makes much more sense to think in terms of good dismantling sets as a whole and not about individual nodes as the optimal influencers/spreaders (15). We further study the correlation between the degree of a node and its importance for dismantling, exploiting a natural variant of our algorithm, in which the dismantling set is required to avoid some marked nodes. This study allows us to show that each of the low-degree nodes can be replaced by other nodes without increasing significantly the size of the dismantling set. Contrary to claims in ref. 15, we do not confirm any particular importance of weak nodes, apart from the obvious fact that the set of highest-degree nodes is not a good dismantling set.
To give a quantitative idea of our algorithmic contribution, we state two representative examples of the kind of improvement that we obtain with the above algorithm with respect to the state of the art (15).
i) In an Erdős–Rényi (ER) random graph of average degree 3.5 and size N=57, we found C=1,000 dismantling sets removing 17.8% of the nodes, whereas the best known method (adaptive eigenvalue centrality for this case) removes 20.2% of the nodes, and the adaptive CI method of ref. 15 removes 20.6% of the nodes. Hence, we provide a 13% improvement over the state of the art. Our theoretical analysis estimates the optimum dismantling number to be around 17.5% of the nodes; thus, the algorithm is extremely close to optimal in this case.
ii) Our algorithm managed to dismantle the Twitter network studied in ref. 15 (with 532,000 nodes) into components smaller than C=1,000 using only 3.4% of the nodes, whereas the CI heuristics of ref. 15 needs 5.6% of the nodes. Here, we thus provide a 60% improvement over the state of the art.
Not only does our algorithm show beyond state of the art performance, but it is also computationally efficient. Its core part runs in linear time over the number of edges, allowing us to easily dismantle networks with tens of millions of nodes.
Relation Between Dismantling and Decycling
We begin our discussion by clarifying the relation between the dismantling and decycling problems. Although the argument below can be found in ref. 10, we reproduce it here in a simplified fashion. The decycling number (or more precisely, fraction) θdec(G) of G is the minimal fraction of vertices that have to be removed to make the graph acyclic. We define similarly the dismantling number θdis(G,C) of a graph G as the minimal fraction of vertices that have to be removed to make the size of the largest component of the remaining graph smaller than a constant C.
For random graphs with degree distribution q={qk}k≥0, in the large size limit, the parameters θdec and θdis will enjoy concentration (self-averaging) properties; we shall thus write their typical values asθdec(q)=limN→∞E[θdec(G)],[1]
The
crucial point for the relation between dismantling and decycling is
that trees (or more generically, forests) can be efficiently dismantled.
It was proven in ref. 10 that
This
observation brings us to the following two claims concerning the
dismantling and decycling numbers for random graphs with degree
distribution q: (i) for any degree distribution,
The first claim follows directly from the above observation on the decycling number of forests. After a decycling set S of G has been found, one can add to S additional vertices to turn it into a C-dismantling set, the additional cost being bounded as
To justify our second claim, we consider a C-dismantling set S of a graph G. To turn S into a decycling set, we need to add additional vertices to break the cycles that might exist in
Network Decycling
In this section, we shall explain the results on the decycling number of random graphs that we obtained via statistical mechanics methods and how they can be exploited to build an efficient heuristic algorithm for decycling arbitrary graphs.
Testing the Presence of Cycles in a Graph.
The 2-core of a graph G
is its largest subgraph of minimal degree 2; it can be constructed by
iteratively removing isolated nodes and leaves (vertices of degree 1)
until either all vertices have been removed or all remaining vertices
have degree at least 2. It is easy to see that a graph contains cycles
if and only if its 2-core is nonempty. To decide if a subset S is decycling, we remove the nodes in S and perform this leaf removal on the reduced graph. To formalize this procedure, we introduce binary variables
Note that the leaf removal procedure can be equivalently viewed as a particular case of the linear threshold model of epidemic propagation or information spreading. By calling a removed vertex infected (or informed), one sees that the infection (or information) of node i occurs whenever the number of its infected (or informed) neighbors reaches its degree minus one. This equivalence, which was already exploited in refs. 15 and 23, allows us to build on previous works on minimal contagious sets (20, 21, 23) and influence maximization (7⇓–9).
Optimizing the Size of Decycling Sets.
From
the point of view of statistical mechanics, it is natural to introduce
the following probability distribution over the subsets S to find the optimal decycling sets of a given graph
The exact computation of the partition function (Eq. 8) for an arbitrary graph remains an NP-hard problem. However, if G is a sparse random graph, the large size limit of its free energy density
Better parametrizations with a number of real values per message that scale linearly with T (rather than quadratically) can be devised (21, 23). A parametrization with
The (1RSB) cavity predictions for the decycling number of ER random graphs of average degree d and the decycling number reached by the Min-Sum algorithm on graphs of size
Min-Sum Algorithm for the Decycling Problem.
We
turn now to the description of our heuristic algorithm for finding
decycling sets of the smallest possible size. The above analysis shows
the equivalence of this problem with the minimization of the cost
function
This system can be solved efficiently by iteration. The computation of one iteration takes
Results for Dismantling
Results on Random Graphs.
The outcome of our algorithm applied to an ER random graph of average degree 3.5 is presented in Fig. 1. Here, the red circle corresponds to the output of its first stage (decycling with Min-Sum), which yields, after the removal of a fraction 0.1781 of the nodes, an acyclic graph in which the largest components contain a fraction 0.032 of the vertices. The red line corresponds to the second stage, which further reduces the size of the largest component by greedily breaking the remaining trees. We compare with simulated annealing (SA; black circle) as well as several incremental algorithms that successively remove the nodes with the highest scores, where the score of a vertex is a measure of its centrality. Other than a trivial function that gives the same score to all vertices [hence removing the vertices in random order (RND)] and the score of a vertex equal to its degree, we used the eigenvector centrality (EC) measure and the recently proposed CI measure (15). We used all of these heuristics in an adaptive way, recomputing the scores after each removal. Additional details on all of these algorithms can be found in SI Appendix.
Fraction
of nodes in the largest component as a function of the fraction of
removed nodes for an ER random graph of average degree
We see from Fig. 1 that the Min-Sum algorithm outperforms the others by a considerable margin: it dismantles the graph using
In Fig. 2,
we zoom in on the results of the second stage of our algorithm and
perform a finite size scaling analysis, increasing the size of the
dismantled graphs up to
Fraction of nodes in the largest component
Combinatorial
optimization problems typically exhibit a very large degeneracy of
their (quasi)optimal solutions. We performed a detailed statistical
analysis of the quasioptimal dismantling sets constructed by our
algorithm, exploiting the fact that the Min-Sum algorithm finds
different decycling sets for different realizations of the random
tie-breaking noise
For a given ER random graph of average degree 3.5 and size
Frequencies with which vertices appear in different decycling sets on an ER graph with
An
important question to ask about dismantling sets is whether they can be
thought of as a collection of nodes that are in some sense good
spreaders or whether they are a result of highly correlated
optimization. We use the result of the previous experiment and remove
the nodes that appeared most often (i.e., have the highest frequencies
in Fig. 3). If the
nature of dismantling was additive rather than collective, then such a
choice should further decrease the size of the discovered dismantling
set. This scenario is not what happens; with this strategy, we need to
remove
We also studied the
degree histogram of nodes that the Min-Sum algorithm includes in the
dismantling sets and saw that, as expected, most of the high-degree
nodes belong to most of the dismantling sets. Each of the dismantling
sets also included some nodes of relatively low degrees; for instance,
for an ER random graph of average degree
More General Graphs.
Up
to this point, our study of dismantling relies crucially on the
relation to decycling. For light-tailed random graphs, these two
problems are essentially asymptotically equivalent. However, for
arbitrary graphs that contain many small cycles, the decycling number
can be much larger than the dismantling one. We argue that, from the
algorithmic point of view, decycling still provides a very good basis
for dismantling. For instance, consider a portion of
Fraction
In
a network that contains many short cycles, decycling removes a large
proportion of nodes expressly to destroy these short cycles. Many of
these nodes can be put back without increasing the size of the largest
component. For this reason, we introduce a reverse greedy (RG)
procedure, in which starting from a dismantled graph with dismantling
set S, maximum component size C, and a chosen target value
In
graphs where decycling is an optimal strategy for dismantling, such as
the random graphs, a vanishing fraction of nodes can be reinserted by
the RG procedure before the size of the largest component starts to grow
steeply. For real world networks, the RG procedure reinserts a
considerable number of nodes, negligibly altering the size of the
largest component. For the Twitter network in Fig. 4, the improvement obtained by applying the RG procedure is impressive:
The RG procedure is introduced as a heuristic that provides a considerable improvement for the examples that we treated. The theoretical results of this paper are valid only for classes of graphs that do not contain many small cycles, and hence, our theory does not provide a principled derivation or analysis of the RG procedure. This point is an interesting open direction for future work. More detailed study (both theoretical and algorithmic) of dismantling of networks for which decycling is not a reasonable starting point is an important direction of future work.
Acknowledgments
A.B. and L.D. acknowledge support by Fondazione Cassa di Risparmio di Torino (CRT) under the initiative La Ricerca dei Talenti. L.D. acknowledges the European Research Council Grant 267915.
Footnotes
- ↵1To whom correspondence should be addressed. Email: lenka.zdeborova{at}cea.fr.
Author contributions: A.B., L.D., G.S., and L.Z. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605083113/-/DCSupplemental.
Freely available online through the PNAS open access option.
References
- ↵
- ↵.
- Barrat A,
- Barthelemy M,
- Vespignani A
- ↵
- ↵
- ↵
- ↵.
- Altarelli F,
- Braunstein A,
- Dall’Asta L,
- Wakeling JR,
- Zecchina R
- ↵.
- Kempe D,
- Kleinberg J,
- Tardos E
- ↵.
- Chen N
- ↵
- ↵.
- Janson S,
- Thomason A
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Karp RM
- ↵
- ↵.
- Altarelli F,
- Braunstein A,
- Dall’Asta L,
- Zecchina R
- ↵
- ↵
- ↵
- ↵.
- Mézard M,
- Parisi G
- ↵.
- Mézard M,
- Montanari A
- ↵.
- Leskovec J,
- Krevl A
Citation Manager Formats
More Articles of This Classification
Physical Sciences
Related Content
- No related articles found.