Memory in network flows and its effects on spreading dynamics and community detection

Journal name:
Nature Communications
Volume:
5,
Article number:
4630
DOI:
doi:10.1038/ncomms5630
Received
Accepted
Published

Abstract

Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking and spreading analysis, although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and although we only observe marginal consequences for disease spreading, we show that ignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking and information spreading. For example, capturing dynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in scientific communication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for higher-order memory in network flows can help us better understand how real systems are organized and function.

At a glance

Figures

left
  1. First-order Markov dynamics distort real constraints on flow.
    Figure 1: First-order Markov dynamics distort real constraints on flow.

    (a) In a first-order Markov approach, we model passengers’ travel to a city to be proportional to the observed volume of traffic to that city, and irrespective of where the passengers come from. (b) In a second-order Markov model, passengers’ travel to a city is still proportional to the traffic volume, but also dependent on where the passengers come from. In this example, out-and-back traffic to Chicago only dominates overtransfer traffic when second-order Markov dynamics are taken into account. (c,d) Journal citation flow shows the same memory effect. Citation flow from four different journals to PNAS is mostly shown to return to the same journal or continue to a related journal only when second-order Markov dynamics are taken into account. The percentages represent the relative return flow.

  2. Significant second-order Markov constraints on flow.
    Figure 2: Significant second-order Markov constraints on flow.

    (af) First- and second-order conditional entropy for all nodes of the six analysed networks. Blue nodes show a significant memory effect, because the null hypothesis that the data are generated from a first-order Markov model can be rejected. Red nodes do not show a significant effect. The memory effect is the difference in entropy between a first- and second-order Markov model. Las Vegas, among all cities, shows the strongest memory effect. Traffic is dominated by visitors who return to the city from which they came. In the other extreme, nodes that we could not significantly distinguish from a first-order Markov model typically have low connectivity and relatively small entropies.

  3. Memory affects modular overlap in air traffic between US cities.
    Figure 3: Memory affects modular overlap in air traffic between US cities.

    Major modules of Las Vegas and Atlanta with first-order Markov dynamics in a and second-order Markov dynamics in b. Link colours represent modules and link thicknesses represent passenger volume.

  4. Memory affects ranking of nodes.
    Figure 4: Memory affects ranking of nodes.

    (a) Comparing changes in flow from a first- to a second-order Markov model (M1 to M2). Three kinds of flow were tracked for all journals in both M1 and M2: (1) quality flow from the top ten journals (green) or all other journals (yellow), (2) internal flow coming from journals without crossing community boundaries (dark blue) versus external flow that does cross community boundaries (light blue) and (3) return flow after two steps on the network (dark red) versus lost flow after two steps on the network (light red). M1 is always the top bar and M2 is the bottom bar (see legend insert). The error bars indicate the 10th and 90th bootstrap percentiles. The error bars at the intersection of the stacked bars represent the variation in the quality, internal and return flows, respectively. The error bars at the end of the stacked bars represent the total variation in flow. M1 bars for quality and return flows do not show error bars, as the networks are exactly the same. The journals selected for this figure were chosen because they are mentioned in the primary text. (b) Largest gainers and losers in the top 100 journals when including effects of second-order Markov dynamics. The upper portion shows the journals that gain the most in flow and the lower portion shows the journals that lose the most in flow. Dark blue indicates a gain/loss in flow coming from journals without crossing community boundaries (internal flow). The light blue indicates a gain/loss in the flow that does cross community boundaries (external flow). The dark red shows the net gain and orange shows the net loss for each journal listed. The error bars indicate 10th and 90th bootstrap percentiles.

  5. Memory and spreading processes.
    Figure 5: Memory and spreading processes.

    (a) Fraction of infected individuals (prevalence) as a function of time measured in days from the beginning of the process. The two curves represent a first- (M1) and a second-order (M2) Markov process. We seeded the outbreak with 100 infected individuals in Anchorage (top) and Los Angeles (bottom). (b) Prevalence as a function of time in a modified data set where people only travel to a city and come back. (c) Fraction of individuals that have received the rumour as a function of scaled time from the beginning of the process. The three curves show different level of mixing between the first- and second-order Markov model. The shaded area gives the 25 and 75 percentiles, and the solid curve is the median.

  6. From pathway data to networks with and without memory.
    Figure 6: From pathway data to networks with and without memory.

    (a) Itineraries weighted by passenger number. (b) Aggregated bigrams for links between physical nodes. (c) Aggregated trigrams for links between memory nodes. (d) Network without memory. (e) Network with memory. Corresponding dynamics in Fig. 1a,b.

  7. Second-order memory dynamics reveal overlapping modules.
    Figure 7: Second-order memory dynamics reveal overlapping modules.

    Pathway data between San Francisco, Las Vegas and New York, represented with memory nodes capturing first-order (a,b) and second-order (c,d) memory dynamics. With first-order memory, the characteristic out-and-back travel of Las Vegas is lost and the dynamics are best described as movements in one module; describing the dynamics with two overlapping modules requires 0.63 more bits. With second-order memory, the out-and-back travel is evident and the dynamics are best described as movements in two overlapping modules, as movements between the modules are very rare. See Supplementary Fig. 4 for a detailed derivation of the description lengths.

  8. Performance tests on benchmark networks.
    Figure 8: Performance tests on benchmark networks.

    Performance of Infomap. The blue and red curves refer to M1 and M2 structural information, respectively, whereas the yellow curve was obtained by running standard (undirected) Infomap on a network in which each memory node is treated as a physical node. Lines show median values and shaded areas cover 25 and 75 percentiles.

right

References

  1. Watts, D. & Strogatz, S. Collective dynamics of ‘small-world’ networks. Nature 393, 440442 (1998).
  2. Barabási, A. & Albert, R. Emergence of scaling in random networks. Science 286, 509512 (1999).
  3. Barrat, A., Barthelemy, M., Pastor-Satorras, R. & Vespignani, A. The architecture of complex weighted networks. Proc. Natl Acad. Sci. USA 101, 3747 (2004).
  4. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D. Complex networks: structure and dynamics. Phys. Rep. 424, 175308 (2006).
  5. Granovetter, M. The strength of weak ties: a network theory revisited. Sociol. Theory 1, 201233 (1983).
  6. Brin, S. & Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN 30, 107117 (1998).
  7. Brockmann, D., Hufnagel, L. & Geisel, T. The scaling laws of human travel. Nature 439, 462465 (2006).
  8. Balcan, D. et al. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc. Natl Acad. Sci. USA 106, 2148421489 (2009).
  9. Vespignani, A. Modelling dynamical processes in complex socio-technical systems. Nat. Phys. 8, 3239 (2012).
  10. Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379423 (1948).
  11. Box, G. E., Jenkins, G. M. & Reinsel, G. C. Time Series Analysis: Forecasting and Control John Wiley & Sons (2013).
  12. Kareiva, P. & Shigesada, N. Analyzing insect movement as a correlated random walk. Oecologia 56, 234238 (1983).
  13. Robins, G., Snijders, T., Wang, P., Handcock, M. & Pattison, P. Recent developments in exponential random graph (p*) models for social networks. Soc. Networks 29, 192215 (2007).
  14. Meiss, M. R., Menczer, F., Fortunato, S., Flammini, A. & Vespignani, A. inProc. Internat. Conf. on Web Search and Web Data Mining 6576ACM (2008).
  15. Chierichetti, F., Kumar, R., Raghavan, P. & Sarlós, T. inProc. 21st Internat. Conf. on World Wide Web 609618ACM (2012).
  16. Asztalos, A. & Toroczkai, Z. Network discovery by generalized random walks. Europhys. Lett. 92, 50008 (2010).
  17. Backstrom, L. & Leskovec, J. inProc. fourth ACM Internat. Conf. on Web Search and Data Mining 635644ACM (2011).
  18. Singer, P., Helic, D., Taraghi, B. & Strohmaier, M. Memory and structure in human navigation patterns. Preprint at http://arXiv.org/abs/1402.0790 (2014).
  19. Iribarren, J. L. & Moro, E. Impact of human activity patterns on the dynamics of information diffusion. Phys. Rev. Lett. 103, 038702 (2009).
  20. Takaguchi, T., Nakamura, M., Sato, N., Yano, K. & Masuda, N. Predictability of conversation partners. Phys. Rev. X 1, 011008 (2011).
  21. Holme, P. & Saramäki, J. Temporal networks. Phys. Rep. 519, 97125 (2012).
  22. Lentz, H. H., Selhorst, T. & Sokolov, I. M. Unfolding accessibility provides a macroscopic approach to temporal networks. Phys. Rev. Lett. 110, 118701 (2013).
  23. Pfitzner, R., Scholtes, I., Garas, A., Tessone, C. J. & Schweitzer, F. Betweenness preference: quantifying correlations in the topological dynamics of temporal networks. Phys. Rev. Lett. 110, 198701 (2013).
  24. Gonzalez, M., Hidalgo, C. & Barabási, A. Understanding individual human mobility patterns. Nature 453, 779782 (2008).
  25. Heath, M., Vernon, M. & Webb, C. Construction of networks with intrinsic temporal structure from UK cattle movement data. BMC Vet. Res. 4, 11 (2008).
  26. Song, C., Qu, Z., Blumm, N. & Barabási, A. Limits of predictability in human mobility. Science 327, 10181021 (2010).
  27. Balcan, D. & Vespignani, A. Phase transitions in contagion processes mediated by recurrent mobility patterns. Nat. Phys. 7, 581586 (2011).
  28. Belik, V., Geisel, T. & Brockmann, D. Natural human mobility patterns and spatial spread of infectious diseases. Phys. Rev. X 1, 011001 (2011).
  29. Poletto, C., Tizzoni, M. & Colizza, V. Human mobility and time spent at destination: Impact on spatial epidemic spreading. J. Theor. Biol. 338, 4158 (2013).
  30. Rosvall, M. & Bergstrom, C. Maps of random walks on complex networks reveal community structure. Proc. Natl Acad. Sci. USA 105, 1118 (2008).
  31. Delvenne, J., Yaliraki, S. & Barahona, M. Stability of graph communities across time scales. Proc. Natl Acad. Sci. USA 107, 1275512760 (2010).
  32. May, R. & Lloyd, A. Infection dynamics on scale-free networks. Phys. Rev. E 64, 066112 (2001).
  33. Pastor-Satorras, R. & Vespignani, A. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 32003203 (2001).
  34. Nekovee, M., Moreno, Y., Bianconi, G. & Marsili, M. Theory of rumour spreading in complex social networks. Phys. A 374, 457470 (2007).
  35. Bergstrom, C., West, J. & Wiseman, M. The eigenfactor metrics. J. Neurosci. 28, 1143311434 (2008).
  36. Parry, W. Intrinsic markov chains. Trans. Am. Math. Soc. 112, 5566 (1964).
  37. Sinatra, R., Gómez-Gardeñes, J., Lambiotte, R., Nicosia, V. & Latora, V. Maximal-entropy random walks in complex networks with limited information. Phys. Rev. E 83, 030103 (2011).
  38. Van der Heyden, M., Diks, C., Hoekstra, B. & DeGoede, J. Testing the order of discrete Markov chains using surrogate data. Phys. D 117, 299313 (1998).
  39. Esquivel, A. & Rosvall, M. Compression of flow can reveal overlapping-module organization in networks. Phys. Rev. X 1, 021025 (2011).
  40. Scholtes, I. et al. Slow-down vs. speed-up of information diffusion in non-markovian temporal networks. Preprint at http://arXiv.org/abs/1307.4030 (2013).
  41. Lambiotte, R., Salnikov, V. & Rosvall, M. Effect of memory on the dynamics of random walks on networks. Preprint at http://arXiv.org/abs/1401.0447 (2014).
  42. Newman, M. Modularity and community structure in networks. Proc. Natl Acad. Sci. USA 103, 85778582 (2006).
  43. Lancichinetti, A., Radicchi, F., Ramasco, J. & Fortunato, S. Finding statistically significant communities in networks. PLoS ONE 6, e18961 (2011).
  44. Ahn, Y., Bagrow, J. & Lehmann, S. Link communities reveal multiscale complexity in networks. Nature 466, 761764 (2010).
  45. Evans, T. & Lambiotte, R. Line graphs, link partitions, and overlapping communities. Phys. Rev. E 80, 016105 (2009).
  46. Yang, J. & Leskovec, J. inProc. ACM SIGKDD Workshop on Mining Data Semantics 3, ACM (2012).
  47. Palla, G., Derényi, I., Farkas, I. & Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814818 (2005).
  48. Bohlin, L., Esquivel, A. V., Lancichinetti, A. & Rosvall, M. Robustness of journal rankings by network flows with different amounts of memory. Preprint at http://arXiv.org/abs/1405.7832 (2014).
  49. West, J. D., Bergstrom, T. C. & Bergstrom, C. T. The Eigenfactor metrics: A network approach to assessing scholarly journals. Coll. Res. Libr. 71, 236244 (2010).
  50. Garfield, E. The history and meaning of the journal impact factor. JAMA 295, 9093 (2006).
  51. Monastersky, R. The number that’s devouring science. Chron. High. Educ. 52, A12 (2005).
  52. Rocha, L. E., Liljeros, F. & Holme, P. Simulated epidemics in an empirical spatiotemporal network of 50,185 sexual contacts. PLoS Comput. Biol. 7, e1001109 (2011).
  53. Poletto, C., Tizzoni, M. & Colizza, V. Heterogeneous length of stay of hosts' movements and spatial epidemic spread. Sci. Rep. 2, 476 (2012).
  54. Keeling, M., Danon, L., Vernon, M. & House, T. Individual identity and movement networks for disease metapopulations. Proc. Natl Acad. Sci. USA 107, 88668870 (2010).
  55. Lessler, J., Kaufman, J. H., Ford, D. A. & Douglas, J. V. The cost of simplifying air travel when modeling disease spread. PLoS ONE 4, e4403 (2009).
  56. Colizza, V., Pastor-Satorras, R. & Vespignani, A. Reaction–diffusion processes and metapopulation models in heterogeneous networks. Nat. Phys. 3, 276282 (2007).
  57. Girvan, M. & Newman, M. E. Community structure in social and biological networks. Proc. Natl Acad. Sci. USA 99, 78217826 (2002).
  58. Danon, L., Daz-Guilera, A., Duch, J. & Arenas, A. Comparing community structure identification. J. Stat. Mech. Theor. Exp. 2005, P09008 (2005).
  59. Lancichinetti, A., Fortunato, S. & Kertész, J. Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 11, 033015 (2009).
  60. Langville, A. & Meyer, C. Deeper inside PageRank. Int. Math. 1, 335380 (2004).
  61. Lambiotte, R. & Rosvall, M. Ranking and clustering of nodes in networks with smart teleportation. Phys. Rev. E 85, 056107 (2012).
  62. Gregory, S. Finding overlapping communities in networks by label propagation. New J. Phys. 12, 103018 (2010).
  63. McDaid, A. & Hurley, N. in:Advances in Social Networks Analysis and Mining (ASONAM), 2010 International Conference 112119IEEE (2010).

Download references

Author information

Affiliations

  1. Integrated Science Lab, Department of Physics, Umeå University, Linnaeus va¨g 24 , SE-901 87 Umeå, Sweden

    • Martin Rosvall,
    • Alcides V. Esquivel,
    • Andrea Lancichinetti &
    • Jevin D. West
  2. Department of Chemical and Biological Engineering, Howard Hughes Medical Institute (HHMI), Northwestern University, Evanston, Illinois 60208, USA

    • Andrea Lancichinetti
  3. Information School, University of Washington, Seattle, Washington 98195, USA

    • Jevin D. West
  4. Department of Mathematics and Naxys, University of Namur, 5000 Namur, Belgium

    • Renaud Lambiotte

Contributions

M.R., A.V.E. and R.L. conceived the project. M.R. developed the community-detection algorithm. A.V.E. assembled the data and carried out the significance analysis. A.L. developed the epidemic and memory models. J.D.W. performed the ranking analysis. R.L. derived the analytical results. M.R. wrote the manuscript and all authors wrote the Supplementary Information. All authors were involved in interpreting the results and commented on the manuscript at all stages.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Information (770 KB)

    Supplementary Figures 1-9, Supplementary Tables 1-4, Supplementary Notes 1-4 and Supplementary References

Additional data