Statistical Mechanics of the Minimum Dominating Set Problem
- First Online:
- 21 February 2015
- Received:
- Accepted:
DOI: 10.1007/s10955-015-1220-2
- Cite this article as:
- Zhao, JH., Habibulla, Y. & Zhou, HJ. J Stat Phys (2015) 159: 1154. doi:10.1007/s10955-015-1220-2
- 5 Citations
- 209 Downloads
Abstract
The minimum dominating set (MDS) problem has wide applications in network science and related fields. It aims at constructing a node set of smallest size such that any node of the network is either in this set or is adjacent to at least one node of this set. Although this optimization problem is generally very difficult, we show it can be exactly solved by a generalized leaf-removal (GLR) process if the network contains no core. We present a percolation theory to describe the GLR process on random networks, and solve a spin glass model by mean field method to estimate the MDS size. We also implement a message-passing algorithm and a local heuristic algorithm that combines GLR with greedy node-removal to obtain near-optimal solutions for single random networks. Our algorithms also perform well on real-world network instances.
Keywords
Dominating setSpin glassCore percolationLeaf removal Network coarse-grainingBelief propagation1 Introduction
An example of minimum dominating set. a a small network with nodes and links. bBlue (dark gray) indicates a node being occupied, while cyan (light gray) indicates a node being empty but observed. The three occupied nodes form a MDS for this network. c A coarse-grained representation of the network based on the MDS of (b) (Color figure online)
The MDS problem has wide practical applications, such as monitoring large-scale power grids and other transportation systems [2], controlling the spreading of infectious diseases and other network dynamical processes [6–9], efficient routing in wireless networks [10], and network public goods games (e.g., resource allocation) [11]. Another application is to build a coarse-grained representation for a complex network starting from a MDS. Such an idea has already been applied to multi-document summarization in the field of information extraction [12]. Each node of the MDS can be regarded as a representative node for a local domain of the network. We can take the subnetwork induced by node and all its adjacent nodes (except those in the MDS) as a coarse-grained node, and set up an coarse-grained link between two coarse-grained nodes if the two corresponding subnetworks share at least one node or are connected by at least one link in the original network (see Fig. 1c for an example). If such a coarse-graining process is iterated we will then obtain a hierarchical representation for the original network, which may be very useful for understanding the organization of a complex system and for searching and information transmission in such a system.
Exactly solving the MDS problem, however, is extremely difficult in general, since it is a nondeterministic polynomial-complete (NP-complete) optimization problem [1]. Even the task of approximately solving the MDS problem is very hard. For a general network of nodes, so far the best polynomial algorithms can only guarantee to get dominating sets with sizes not exceeding times of the minimum size [13, 14]. Many local-search algorithms have been proposed to solve the MDS problem heuristically (see review [1] and [2, 6, 7, 9, 15, 16]), but theoretical results on the MDS sizes of random network ensembles are still very rare.
In this work we bring several new theoretical and algorithmic contributions. We show in Sect. 2 that a generalized leaf-removal (GLR) process may cause a core percolation transition, and propose a quantitative theory to describe this percolation. If the network contains no core, GLR reaches an exact MDS; if an extensive core exists, we combine GLR with a local greedy process in Sect. 3 to get an upper bound to the MDS size. We then introduce a spin glass model in Sect. 4 and estimate the MDS size by a replica-symmetric (RS) mean field theory, and implement a message-passing algorithm in Sect. 5 to get near-optimal dominating sets for single random network instances. Our algorithms also perform well on real-world network instances. This work shall be useful both for network scientists who are interested in applying the MDS concept to practical problems, and for applied mathematicians who seek better theoretical understanding on the random MDS problem.
2 Generalized Leaf-Removal and Core Percolation
Consider a simple network formed by nodes and undirected links, each link connecting between two different nodes. Each node with index is either empty (indicated by the occupation state ) or occupied by sensors (). A node is regarded as observed if it is occupied or it is empty but adjacent to one or more occupied nodes, otherwise it is regarded as unobserved. We need to occupy a set of nodes to make all the nodes be observed, and the objective is to make the dominating set as small as possible, i.e., to construct a minimum dominating set. It is easy to verify that the three occupied nodes of Fig. 1b form a MDS for that small network. Notice a network may have more than one MDS.
2.1 The GLR Process
Here we extend the leaf-removal idea of [17] (see also more recent work [6, 18, 19]) and consider a generalized leaf-removal process. This dynamics is based on the following two considerations: first, as pointed out in [6, 17], if node is an unobserved leaf node (which has only a single neighbor, say ), then occupying but leaving empty must be an optimal strategy; second, we notice that if is an empty but observed node and at most one of its adjacent nodes is unobserved, then it must be an optimal strategy not to occupy . This second point was not considered in the conventional leaf-removal process [6].
- (0)
As long as there is an isolated node in network , fix its occupation state to and delete it from . All such fixed nodes must also belong to .
- (1)
As long as there is a leaf node in network which is not yet observed, fix the occupation state of its unique neighbor to and fix that of to so that and all its adjacent nodes (including ) are now observed, see Fig. 2 (left panel). We then delete node and all its connected links from . If belongs to then node must not belong to it, because otherwise could not have been a MDS. On the other hand, if node does not belong to then node must belong to it, and in this latter case we modify by adding to it and deleting from it.
- (2)
Then as long as there is a node which is itself observed and which has only a single unobserved neighbor , delete the link from network , see Fig. 2 (right panel). We do not modify if node does not belong to it. If node does belong to then node must not belong to it, and in this latter case we add to and delete from it.
- (3)
Then as long as there is an observed node which is not connected to any unobserved node, fix its occupation state to and delete it and all its attached links from . Such a node must not belong to , for otherwise could not have been a MDS.
- (4)
If the resulting network is empty or it contains no isolated node nor leaf node, the GLR process stops. If still contains at least one isolated or leaf node, then we increase the evolution step from to and initialize the network as identical to . A node of is regarded as observed if and only if it is observed in network . We then repeat the above-mentioned operations (1)–(3).
The two basic operations of the generalized leaf-removal process. White circles denote empty and unobserved nodes, cyan (light gray) circles denote empty but observed nodes, and blue (dark gray) circles denote occupied nodes. Left panel the unique adjacent node of an unobserved leaf node is occupied, and all the neighbors of are observed. Right panel an empty observed node has only a single unobserved neighbor , then the link between and is deleted (Color figure online)
If the original network has no core, then the set of occupied nodes by the GLR process must be identical to the final , which is a MDS modified from the original MDS. We have therefore proven that GLR constructs a MDS for a network if this network contains no core. (All the above-mentioned modification operations on are ignored in the actual implementation of the GLR process. They are introduced here solely for proving that GLR is able to construct a MDS if there is no core.) Furthermore, we notice that if the GLR process finishes with some nodes remaining to be unobserved, the set of nodes occupied during this process must be a MDS for the subnetwork of induced by all the observed nodes. This is because all these occupied nodes also belong to the modified MDS , while all those nodes fixed to be unoccupied are outside of .
Generalized leaf-removal on Erdös–Rényi random networks. and are the fractions of occupied and unobserved nodes, respectively. Cross symbols are results obtained by running GLR on a single ER network of size and mean degree ; solid lines are the predictions of the percolation theory for
Generalized leaf-removal on scale-free random networks of decay exponent (from left to right) generated through the static model [22]. and are the fractions of occupied and unobserved nodes, respectively. Red dash-dotted lines are results obtained by running GLR on a single network instance of size and mean degree , while blue dashed lines are results obtained by the core percolation theory using the degree profile of this network instance as input. Black solid lines are the predictions of the percolation theory for (Color figure online)
Generalized leaf-removal on pure scale-free random networks with minimum degree . is the fraction of occupied nodes. The fraction of unobserved nodes is simply . Red triangle symbols are results obtained by running GLR on a single pure SF network of size and decay exponent , while blue cross symbols are results obtained by the percolation theory using the degree profile of this network instance as input. The black solid line is obtained by the percolation theory at (Color figure online)
Notice the core percolation transition resulting from the GLR optimization process is qualitatively different from the simpler observability transition discussed in [2], which considers the appearance of a giant connected component of observed nodes resulting from an initial set of randomly chosen occupied nodes. We now develop a percolation theory to thoroughly understand the GLR dynamics on random networks.
2.2 The Core Percolation Theory
If node is adjacent to one or more nodes that are occupied at the evolution step and all its other adjacent nodes (except ) are observed at this evolution step, then at the end of this evolution step is unoccupied but observed and it is not adjacent to any unobserved node (except ). This then leads to the expression (5) for . Notice such an observed but unoccupied node will be deleted at the end of the evolution step. After all such nodes are deleted, some unobserved nodes in the remaining network may become isolated or be connected to only a single node. If this is the case, these isolated or leaf nodes will trigger the next () evolution step.
2.3 Results on Erdös–Rényi Random Networks
2.4 Results on Scale-Free Random Networks Generated Through the Static Model
Core percolation phase transition in infinite () random scale-free networks generated through the static model [22]. Horizontal axis is the degree decay exponent , while vertical axis is the value of the mean node degree at the phase transition point. Cross symbols are predictions of the core-percolation theory, while the solid line is just a guide for the eye
For random SF networks generated through the static model with nodes, the core percolation transition value of mean node degree is very sensitive to the decay exponent in the region of , and it diverges as approaches 2.0 from above (Fig. 6). At the other limit of , the mean node degree at the phase transition approaches the value of , which is just the core percolation phase transition point of an infinite ER random network.
2.5 Results on Pure Scale-Free Random Networks
We also generate a set of pure SF random networks of finite size following the same procedure as mentioned in [27] (see also the supplementary information of [23]). The minimum node degree of such a SF network is , while the maximum node degree is [27]. When we apply both the GLR process and the core percolation theory on these finite network instances, we find the simulation results on the fraction of nodes in the core and the fraction of occupied nodes are in perfect agreement with the corresponding theoretical results (see Fig. 5). All these finite SF networks contain no core (), and the MDS relative size is an increasing function of .
Figure 5 also demonstrates strong finite-size effect for pure SF random networks of . This finite-size effect is again mainly caused by the cutoff of the maximum node degree of finite networks, which makes the mean node degree of a finite network be smaller than that of an infinite network. For example, at the mean node degree of an infinite network is , while that of a finite network of size is reduced to .
3 Hybrid Local Algorithm
There is a very simple greedy algorithm in the literature to solve the MDS problem approximately, which is based on the concept of node impact [1, 7, 16]). The impact of an unoccupied node equals to the number of nodes that will be observed by occupying . For example, if node has 3 unobserved neighbors, its impact is 4 if is itself unobserved and is 3 if is already adjacent to one or more occupied nodes. Starting from an input network with all the nodes unobserved, the greedy algorithm selects uniformly at random a node from the subset of nodes with the highest impact and fix its occupation state to . All the adjacent nodes of are then observed. If there are still unobserved nodes in the network, the impact value for each of the unoccupied nodes is updated and the greedy occupying process is repeated until all the nodes are observed. This pure greedy algorithm is very easy to implement and very fast, but we find that it usually fails to reach a true MDS even when the input network contains no core.
Here we introduce an improved local algorithm by combining the GLR process with the impact-based greedy process. We call this new algorithm the GLR-Impact hybrid algorithm. Given an input network with all the nodes unobserved, we first carry out the GLR process to simplify as far as possible. If all the nodes are observed during this initial GLR process, a MDS of network is then constructed. For the nontrivial case of some nodes being left unobserved after this GLR process, we first occupy a randomly chosen node from the subset of highest-impact nodes and then perform the GLR process again to further simplify the network as far as possible. We keep repeating such a occupying-followed-by-GLR process until there is no unobserved node left in the network.
Constructing dominating sets for Erdös–Rényi networks (left panel) and regular random networks (right panel). The relative sizes of dominating sets obtained by a single running of the pure greedy, the hybrid, and the BPD algorithm with on 96 ER or RR network instances of and (mean) degree are compared (fluctuations are of order and are not shown). The ensemble-averaged MDS relative sizes obtained by the replica-symmetric mean field theory are also shown (RS)
Constructing dominating sets for scale-free random networks generated through the static model [22]. The relative sizes of dominating sets obtained by a single running of the pure greedy, the hybrid, and the BPD algorithm with on 96 SF network instances of and (mean) degree are compared (fluctuations are of order and are not shown). The degree decay exponent is , 3.0, 2.667, and 2.5, respectively
Real-world networks are often very heterogenous, with a small fraction of highly connected nodes [21]. As a test of the algorithms introduced in this work, we apply the GLR process, the pure greedy algorithm, the hybrid algorithm, and the BPD algorithm to a set of twelve real-world networks. Among these network instances, five are infrastructure networks: European express road network (RoadEU [28]), road network of Texas (RoadTX [29]), power grid of western US states (Grid [30]), and two Internet networks at the autonomous systems level (IntNet1 and IntNet2 [31]); three are information networks: Google webpage network (WebPage [29]), European email network (Email [32]), and research citation network (Citation [31]); three are social contact networks: collaboration network of condensed-matter authors (Author [32]), peer-to-peer interaction network (P2P [33]),and on-line friendship network (Friend [34]); the remaining one is the biological network of protein-protein interactions (PPI [35]).
Results on twelve real-world network instances
Network | Core | Greedy | Hybrid | BPD | |||
---|---|---|---|---|---|---|---|
RoadEU | 1177 | 1417 | 10 | 306 | 428 | 389 | 387 |
PPI | 2361 | 6646 | 64 | 17 | 550 | 539 | 539 |
Grid | 4941 | 6594 | 19 | 603 | 1564 | 1485 | 1481 |
IntNet1 | 6474 | 12,572 | 1458 | 8 | 660 | 656 | 656 |
Author | 23,133 | 93,439 | 279 | 9052 | 3686 | 3612 | 3604 |
Citation | 34,546 | 420,877 | 846 | 11,178 | 3335 | 3168 | 3095 |
P2P | 62,586 | 147,892 | 95 | 35 | 12,710 | 12,582 | 12,582 |
Friend | 196,591 | 950,327 | 14,730 | 6097 | 42,536 | 41,633 | 41,672 |
265,214 | 364,481 | 7636 | 470 | 18,183 | 18,181 | 18,181 | |
WebPage | 875,713 | 4,322,051 | 6332 | 162,439 | 81,288 | 79,928 | 80,769 |
RoadTX | 1,379,917 | 1,921,660 | 12 | 560,582 | 477,729 | 437,503 | 425,774 |
IntNet2 | 1,696,415 | 11,095,298 | 35,455 | 211,244 | 187,592 | 183,516 | 183,248 |
4 Spin Glass Model and Replica-Symmetric Mean Field Theory
4.1 Replica-Symmetric Mean Field Theory
4.2 Belief-Propagation Iterations
Replica-symmetric (RS) mean field theory and belief-propagation (BP) results for ER random networks of mean degree . The RS results are obtained by population dynamics simulations, while the BP results are obtained on a single ER network instance of nodes. The BP iteration converges to a fixed point only for . a Occupation fraction . b Free energy density . c Entropy density . d Entropy density as a function of occupation fraction
For ER networks with mean degree and regular random networks with integer degree , we find that when the re-weighting parameter is larger than certain threshold value, BP iteration is unable to converge to a fixed point. Such a non-convergence phenomenon indicates that, when the random network system has an extensive core, it will be in a spin glass phase at sufficiently large values of . Systematic theoretical investigations on this spin glass phase will be reported in another publication.
4.3 Ensemble-Averaged Properties
A random network ensemble is characterized by a degree distribution . We perform population dynamics simulations using Eqs. (25), (24) and (26) to obtain ensemble-averaged results. First, we create a long array of (e.g., ) elements to store a set of messages, each of which represents a probability distribution in the form of a three-dimensional vector satisfying Eq. (29): . We then repeatedly update elements of this array by the following procedure: (1) generate a random integer according to the degree probability distribution ; (2) draw elements from array uniformly at random, and then use these elements as input messages to Eq. (25) to compute a new message ; (3) replace a randomly chosen element of array with this new message. The message array is expected to reach a steady state after it is updated a sufficient number of times (e.g., after each element of this array is updated times on average).
For ER random networks with mean degree , we compare in Fig. 9 the results obtained by this RS population dynamics with the results obtained by BP iteration on a single network instance. The ensemble-averaged results are in perfect agreement with the BP iteration results (provided the BP iteration is able to converge).
The entropy density as a function of the mean occupation fraction can be obtained from these RS population dynamics results (see for example Fig. 9d). In some random network systems, the entropy density become negative if decreases below certain threshold value , indicating that there is no dominating set with relative size below . We therefore take the value as the ensemble-averaged MDS relative size. For ER networks of , we obtain from Fig. 9 that (the corresponding value of is ). In some other random network systems (e.g., ER random networks with , before the core percolation transition), the entropy density approaches a non-negative limiting value as approaches a limiting value from above. For these latter cases, we simply take as the ensemble-averaged MDS relative size.
The ensemble-averaged results on the MDS sizes of ER and RR networks are shown in Fig. 7. For ER networks with mean node degree (before the core-percolation transition), the RS mean field results coincide with the results predicted by the core percolation theory. When the random network contains an extensive core, the results obtained by the pure greedy algorithm and the GLR-Impact algorithm are higher than the RS mean field predictions, but the results obtained by the BPD algorithm of the next section are very close to the RS mean field predictions.
5 Belief-Propagation-Guided Decimation Algorithm
For a given network , the RS mean field theory gives an estimate for the occupation probability of each node , see Eq. (24). Such information is exploited in a BPD algorithm to construct a near-optimal dominating set. (Such an algorithm and its extensions have already been successfully applied to many other combinatorial optimization problems, e.g., the -satisfiability problem [39, 40] and the vertex-cover problem [5]). At each round of the BPD process, unoccupied nodes with the highest estimated occupation probabilities are added to the dominating set, and the occupation probabilities for the remaining unoccupied nodes are then updated.
- (0)
Input the network , set all the nodes to be empty and unobserved and set all the probability distributions to be the uniform distribution. Set the re-weighting parameter to a sufficiently large value (e.g., ). Then perform the BP iteration a number of rounds (e.g., ). After these iterations we compute the occupation probability of each node using Eq. (24).
- (1)
Then occupy a small fraction (e.g., ) of the unoccupied nodes that having the highest estimated occupation probabilities.
- (2)
Then simplify network by first deleting all the links between observed nodes, and then deleting all the isolated observed nodes.
- (3)
If the resulting network still contains unobserved nodes, we perform BP iteration for a number of rounds (e.g., ). The output message of an node is updated either according to Eq. (24) or according to Eq. (33), depending on whether is unobserved or observed. We then repeat operations (1)–(3) until all the nodes are observed.
The results of the BPD algorithm for random networks and for real-world networks are compared with the results obtained by the local heuristic algorithms in Figs. 7, 8, and Table 1. For ER and RR random networks, the BPD algorithm considerably beats both the pure greedy algorithm and the GLR-Impact hybrid algorithm; for very heterogeneous (e.g., scale-free) networks, the BPD algorithm only slightly outperforms the GLR-Impact algorithm.
6 Discussions
In this work, we proposed two heuristic algorithms (a GLR-Impact local algorithm and a BPD message-passing algorithm) and presented a core percolation theory and a replica-symmetric mean field theory for solving the network dominating set problem algorithmically and theoretically. We found that the GLR process may lead to a core percolation transition in the network (see Figs. 3 and 4). Our numerical results shown in Figs. 7, 8 and Table 1 suggested that the GLR-Impact algorithm and the BPD algorithm can construct near-optimal dominating sets for random networks and real-world networks.
There are many theoretical issues remaining to be investigated. An easy extension of the core percolation theory is to consider GLR with a subset of initially occupied nodes. By optimizing this initial subset (e.g., following the methods of [41–43]), we may reach an improved lower-bound to the MDS size. Core percolation on degree-correlated random networks [44] and in the more general lattice glass problem [3] are also very interesting. When the random network has an extensive core, we observed that the belief-propagation Eq. (25) fails to converge at large values of the re-weighting parameter (see Fig. 9), indicating a spin glass phase transition. A systematic study of the spin glass phase will be carried out using the first-step replica-symmetry-breaking mean field theory [40, 45, 46], which may in addition offer an improved estimate on the ensemble-averaged MDS size. The possible deep connections between core percolation and the complexity of the random MDS problem will also be addressed by adapting the long-range frustration theory [24, 25].
The methods of this work can be readily extended to the MDS problem of directed networks. Our theoretical and algorithmic results on the directed MDS problem will soon be reported in an accompanying paper [47]. A more challenging problem is the connected dominating set problem [48] which has the additional constraint that the nodes in the dominating set should induce a connected subnetwork. Our present work may stimulate further theoretical studies on this hard problem.
Acknowledgments
Part of this work was done when H.-J. Zhou was participating in the “Collective Dynamics in Information Systems 2014” Program of the Kavli Institute for Theoretical Physics China (KITPC). H.-J. Zhou thanks Chuang Wang for a helpful discussion, and Alfredo Braunstein, Yang-Yu Liu, Federico Ricci-Tersenghi, and Yi-Fan Sun for helpful comments on the manuscript; J.-H. Zhao and H.-J. Zhou thank Prof. Zhong-Can Ou-Yang for support. Research partially supported by the National Basic Research Program of China (grant number 2013CB932804) and by the National Natural Science Foundation of China (grand numbers 11121403 and 11225526).