Predicting epidemic thresholds on complex networks: Limitations of mean-field approaches
Abstract
Over the last decade considerable research effort has been invested in an attempt to understand the dynamics of viruses as they spread through complex networks, be they the networks in human population, computers or otherwise. The efforts have contributed to an understanding of epidemic behavior in random networks, but were generally unable to accommodate specific nonrandom features of the network's actual topology. Recently, though still in the context of the mean field theory, Chakrabarti et al. (2008) proposed a model that intended to take into account the graph's specific topology and solve a longstanding problem regarding epidemic thresholds in both random and nonrandom networks. Here we review previous theoretical work dealing with this problem (usually based on mean field approximations) and show with several relevant and concrete counter examples that results to date breakdown for nonrandom topologies.
Highlights
► We review previous mean field approximations theoretical work on epidemic thresholds. ► We examine the theoretical thresholds in both random and nonrandom networks. ► We show that results to date breakdown for nonrandom topologies.
Keywords
- SIS model;
- Markov chain;
- Poisson process;
- Eigenvalue
1. Introduction
Under what conditions will a virus or other infectious agent spread in a complex population network? This question has vexed epidemiologists, mathematicians and computer scientists alike for many decades (Anderson and May, 1991, May and Lloyd, 2001, Pastor-Satorras and Vespignani, 2001, Madar et al., 2004, Aparacio and Pascual, 2007 and Berchenko et al., 2009). An early result arising from epidemic modeling is based on the so-called reproductive number R0, the number of secondary infections a typical infected individual is able to generate ( Anderson and May, 1991). If a typical infected individual is able to infect on average more than one other member of the population then R0>1. In that case the virus is able to reproduce itself and trigger an epidemic in the population, allowing it to persist in time for an extensive period. In contrast, if the reproductive number is below unity then R0<1, and the disease will rapidly die out in the population and an infection free equilibrium will be reached. This threshold result assumes that the population is homogeneous and randomly mixing, whereby an infected individual is equally likely to come into contact and infect any susceptible present, an assumption that has many limitations.
This result has been generalized to heterogeneous populations in which some individuals have more contacts than others. Historically, most notable are the studies of Dietz (1980) and May and Anderson, 1988 and May and Anderson, 1984. For heterogeneous networks, the relative fraction of nodes having different degrees is referred to as the degree distribution. It is just the probability Pk that a randomly chosen node in the network has degree k, i.e, . The mean degree of the network is 〈k〉=∑ kPk and variance of the degree distribution is var(k)=2σ=〈2k〉−2〈k〉. Consider a large population and let dk be the number of individuals in the population that have k contacts, with ∑kdk=L. Then (dk/L) estimates the degree-distribution Pk.
The degree of heterogeneity in the population's contact structure may be gauged by the Coefficient of Variation CV=(σ/〈k〉). Equivalently one can write C2V=(〈2k〉/2〈k〉)−1. The population is assumed to be randomly mixed subjected to the constraint so that the degree-distribution is always preserved. For such a heterogeneous population, it has been shown that (Anderson and May, 1991, Dietz,, May and Anderson, 1988 and May and Anderson, 1984):

Eq. (1) will be referred to as the Dietz–May formula since both authors (May and Anderson, 1988 and May and Anderson, 1984) were responsible for developing the formulation and applying it in practice. As before R0>1 implies that an epidemic will ensue, while R0<1 implies that the virus rapidly dies out and an infection free equilibrium is attained. These concepts have proved useful for studying contact networks with power-law distributions that might typify computer networks and in some cases be appropriate for studying the transmission of sexually distributed diseases ( Pastor-Satorras and Vespignani, 2001 and Lloyd and May, 2001). Measuring R0 is often quite complicated, especially in complex networks as Aparacio and Pascual (2007) have discussed. Newman, 2002 and Cohen et al. (2002) have shown how percolation ideas and generating function methods can be used to provide exact solutions of epidemic models on simple networks and on bi-partite graphs. Their key epidemic threshold result is nevertheless the same as Eq. (1) obtained by the different methods.
Considerable work has been invested in exploring these issues in the biology, physics and mathematics literature. Concepts taken from the percolation theory continue to play a major role in current epidemic network research (Madar et al., 2004, Cohen et al., 2002, Kenah and Robins, 2007 and Parshani et al., 2010). Moreover, there are many challenging open problems (see the interesting inaugural article of Durrett, 2010).
In recent years, there has been considerable interest in understanding the way in which the detailed network structure of the population, or its “topology,” might affect the persistence threshold. That is, does the exact network structure, not just its degree distribution, give extra information from which it is possible to learn more about the spread of an epidemic? Since many real networks are non-random and sometimes highly clustered, the motivation to explore beyond random models is quite justified.
Chakrabarti, Wang, Faloutsos (Chakrabarti et al., 2008) introduced a new model, referred to here as the CWF model, which intended to identify exactly how a population's network structure controls the epidemic threshold. A very general epidemic threshold condition for any arbitrary network was derived. This condition is based on the network's topology as a mean field approximation will be elaborated shortly.
In this paper we show that many of the previous studies contribute to our understanding of epidemic thresholds for random networks, however for nonrandom network topologies (even regular graphs) accurate predictions of the epidemic threshold are hard to come by. We explore the mean field approximation formulated by CWF and show that its predictions often break down for nonrandom networks. This is because mean-field approximations fail to take into account the correlations in the state of indirect neighbors. Moreover, by mapping one model to another, we are able to retrieve known theoretical literature results (based on percolation theory) that contradict the CWF general threshold condition.
2. The CWF model
CWF (Chakrabarti et al., 2008) assume that the population is divided into two classes: individuals that are Susceptible (S) and individuals that are Infected (I). The model has the classical SIS structure whereby susceptible individuals may become infected upon contact with an infected individual. After contracting the disease an individual recovers after some fixed time period and becomes susceptible once again, thereby closing the SIS loop.
As each individual can be in one of the two states, for a complex network of N individuals, there are 2N possible different states the population may be found in. It is appropriate to formulate the model in terms of a Markov chain, but this requires information specifying the probabilities between each of the possible states. In this formulation states correspond to particular configurations of the population network, with the configuration at each time step dependent on the former time step only. However, it is not a simple matter to determine the probabilities of the 2N×2N transition matrix, which in any case is impractical to work with even when N is modestly large, let alone of the order of millions of individuals as is appropriate for large cities. As such, CWF developed a method to approximate the Markov chain model.
In more detail they consider a population network, and define an individual's neighbors as all members of the population he or she can directly contact and transmit the disease. Set β as the probability that an infected individual/node will infect a susceptible neighboring node in the network, and let the probability that node-i is infected at time t be given by pi,t. Over one time-step, the probability that node-i will not receive any infections from its neighbors is, according to CWF, given by
where Mi is the set of all neighbors of node-i. Note Eq. (2) is exact only when it is assumed that the nodes pj,t−1 are independent of each other. This “independence assumption” is of great importance and will be dealt with in detail in what follows. Thus, according to Chakrabarti et al. (2008) the probability that node-i is healthy at time t is given by