Thresholding normally distributed data creates complex networks

George T. Cantwell, Yanchen Liu, Benjamin F. Maier, Alice C. Schwarze, Carlos A. Serván, Jordan Snyder, and Guillaume St-Onge
Phys. Rev. E 101, 062302 – Published 1 June 2020

Abstract

Network data sets are often constructed by some kind of thresholding procedure. The resulting networks frequently possess properties such as heavy-tailed degree distributions, clustering, large connected components, and short average shortest path lengths. These properties are considered typical of complex networks and appear in many contexts, prompting consideration of their universality. Here we introduce a simple model for correlated relational data and study the network ensemble obtained by thresholding it. We find that some, but not all, of the properties associated with complex networks can be seen after thresholding the correlated data, even though the underlying data are not “complex.” In particular, we observe heavy-tailed degree distributions, a large numbers of triangles, and short path lengths, while we do not observe nonvanishing clustering or community structure.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
2 More
  • Received 6 March 2019
  • Revised 6 May 2020
  • Accepted 7 May 2020

DOI:https://doi.org/10.1103/PhysRevE.101.062302

©2020 American Physical Society

Physics Subject Headings (PhySH)

NetworksInterdisciplinary Physics

Authors & Affiliations

George T. Cantwell1,*, Yanchen Liu2, Benjamin F. Maier3,4, Alice C. Schwarze5, Carlos A. Serván6, Jordan Snyder7,8, and Guillaume St-Onge9,10

  • 1Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA
  • 2Center for Complex Network Research, Northeastern University, Boston, Massachusetts 02115, USA
  • 3Robert Koch Institute, Nordufer 20, D-13353 Berlin, Germany
  • 4Department of Physics, Humboldt-University of Berlin, Newtonstraße 15, D-12489 Berlin, Germany
  • 5Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
  • 6Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
  • 7Department of Mathematics, University of California, Davis, California 95616, USA
  • 8Complexity Sciences Center, University of California, Davis, California 95616, USA
  • 9Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
  • 10Centre Interdisciplinaire de Modélisation Mathématique de l'Université Laval, Québec (Québec), Canada G1V 0A6

  • *gcant@umich.edu

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 101, Iss. 6 — June 2020

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement
  • Thresholding relational data to obtain networks. Panel (a) shows a general procedure to obtain unweighted networks from edge weights. Each edge weight is hypothesized to have been drawn from a specific distribution, generating an undirected weighted network. An unweighted network is then produced by assigning an edge whenever an edge weight is greater than a threshold . In panel (b) we show how edge weights are correlated in the model of Sec.  by covariance matrix [Eq. ()]. Edge weights for edges that connect through a node have covariance , while edge weights not connected by a node have zero covariance.

  • Clustering and triangles per node as computed in Sec. . Clustering decreases with increasing number of nodes, however the number of triangles per node increases with a growing number of nodes for large values of . Clustering increases both with increasing mean degree and local edge weight correlation . In panels (a) and (c) we chose and in panels (b) and (d) we fixed .

  • The variance of degree, Eq. (), increases with , the local edge weight correlation. With increasing mean degree , even small correlations produce networks of significantly broader degree distribution than the random graph .

  • We show degree distributions computed using Eq. () for and for increasing local edge weight correlation in log-log (a) and linear scales (b). We also compare them to the asymptotic approximation Eq. (). Note that large values of produce broad degree distributions, which could be easily mistaken for log-normal or power-law distributions.

  • Simulations with , mean degree . 1000 samples were taken for each of the parameter combinations. Panel (a) shows the points of transitions for an increasing number of nodes . To the left of the line, the network does not possess a giant component, while to the right it does. The transition point was computed using the mean size of the second largest component as a susceptibility parameter. Panel (b) shows an example of the susceptibility parameter for .

  • Panel (a) shows the scaling of the average shortest path in the largest connected component with the number of nodes, . We fix the mean degree , and each point is averaged over 200 samples. For we recover the result for the random graph , where . For nonzero correlation, the average shortest path length increases slower than logarithmically. In panel (b) we show the average shortest path length for different mean degrees and values of for networks with , again sampled 200 times for each parameter combination.

  • Simulations for the disparity filter [] applied to . Networks with 2,000 nodes were created by first sampling matrices from a normal distribution and then applying the disparity filter to . Three different values for were used and was set using Eq. () so that the final network would have a mean degree of 4.

Sign up to receive regular email alerts from Physical Review E