Silence on the relevant literature and errors in implementation

To the Editor:

In the August 2013 issue of this journal, Barzel & Barabási reported a method for reconstructing network topologies1. Here we show that the Barzel & Barabási method is a variant of a previously published method, modular response analysis (MRA)2. We also demonstrate that the implementation of their algorithm using statistical similarity measures as a proxy for global network responses to perturbations is erroneous and its performance is overestimated.

The reconstruction of network connections from data remains a fundamental problem in biology. It is not immediately obvious how to capture direct links between individual network nodes from experimental data because a perturbation to a component propagates through a network, causing widespread (global) changes, thereby masking direct (local) connections between nodes. This question has been previously studied in >100 publications, collectively representing MRA (reviewed in refs. 3,4,5,6,7). MRA quantifies direct interactions between network nodes (i and j) using the local response coefficients (also known as connection coefficients), which describe direct effects of a small change in node j on node i, while keeping the remaining nodes unchanged to prevent the spread of the perturbation. The local responses cannot be directly assessed, whereas the global responses can be measured; when following a perturbation to node j, the entire network relaxes to a new steady state. In this new state, nodes that are directly affected by node j have changed, but other indirectly affected nodes also change because the initial perturbation has propagated through the network. MRA has solved the problem of finding local, direct links between components through the global responses for networks of any size and complexity2. The development and application of methods that are conceptually similar to MRA (e.g., regulatory strength analysis8 and maximum likelihood-based MRA9,10) has reinforced the validity of using MRA-type methods to reconstruct network connections11,12,13,14,15,16.

The Barzel & Barabási study1 uses the same concept and strikingly similar terminologies to reconstruct networks by deriving the local connection coefficients from the global response coefficients. Key equations (3) and (4) in their silencing method1 express the local coefficients in terms of the global coefficients and are a subset of the published MRA equations2,9,10,17,18,19 with a formal replacement of the diagonal elements of the local response matrix by zeros instead of minus ones (Supplementary Note 1). Another formal difference is that the variant of the global response matrix used by Barzel & Barabási1 considers the global change in each node that results from the change in every other node of the network, whereas MRA more generally considers the global changes in network nodes that result from changes in any parameter that might affect several nodes simultaneously18,19. Barzel & Barabási claim as one of the main outcomes of their study an approximate solution to equations (3) and (4) (equation (5))1, whereas MRA offers an exact solution2,18,19. Both the approximate1 and the exact MRA2 solutions require the inversion of the global response matrix. Consequently, Barzel & Barabási's approximation1 does not decrease the computational complexity of the exact solution2,18,19. In fact, the proposed approximate iterative method (equations S12 and S13)1 needs repetitive matrix calculations, making it slower than the existing MRA algorithms that provide the exact solution.

The Barzel & Barabási approximate solution relies on the assumption that “typically, perturbations decay rapidly as they propagate through the network, so that the response observed between two nodes is dominated by the shortest path between them”1. This assumption disregards well-documented biological evidence, such as the sensitivity amplification in signaling cascades20,21 and the existence of positive feedback loops in biological networks, which amplify initial signals as they propagate through a network22. The common occurrence of positive or double-negative feedback loops invalidates the Barzel & Barabási assumption for many known signaling pathways, including the restriction point pathway23, cell cycle signaling24,25,26, mitogen-activated protein kinase (MAPK) cascades27,28 that are evolutionary conserved from yeast to mammals, as well as transcription regulation networks29. In these regulatory networks, global responses of the neighboring nodes outside of positive feedback loops are typically smaller than the response of a node that lies inside a positive feedback loop to an upstream node outside the loop (Supplementary Note 1, Supplementary Note 2 and Supplementary Fig. 1). Sensing and processing of stimuli is the normal function of most, if not all, regulatory biological pathways, whose common feature is to increase the response of a 'target' node to a 'source' node with the distance between them30. For example, in an experimental study, Bastiaens and colleagues31 used MRA to unravel the direct linkage topology and strengths of connections in the MAPK cascade in PC12 cells that were stimulated with epidermal growth factor (EGF) versus nerve growth factor (NGF). They calculated the local response coefficients from experimentally measured global responses and found that EGF elicits negative feedback, whereas NGF induces positive feedback, imposed on the backbone of the same MAPK pathway that propagates signals from both growth factors. The experimentally measured response between immediate neighbors, such as the global response of MEK to small interfering RNA (siRNA) against RAF, was much smaller than the response of the more distant neighbor ERK to RAF siRNA under both EGF and NGF stimulation, regardless of the growth factor–specific difference in network wiring31. In contrast to Barzel & Barabási's assumption, real biological networks do not feature a rapid decay of specific perturbations because these pathways have evolved to sense and respond to external cues by processing, amplifying and integrating the signals. Thus, reconstruction of these pathways using the 'average perturbation decay' hypothesis misses key functional features of biological pathways.

Network reconstruction methods that exploit the global network responses require systematic perturbation measurements. Clearly, many high-throughput approaches do not provide such perturbation data, whereas statistical similarity measures can be calculated from omics data. In the absence of perturbation data, Barzel & Barabási reconstructed the network by substitution of the global response coefficients with statistical similarity measures, such as the Pearson and Spearman correlation coefficients, and mutual information1. However, this substitution yields a symmetric correlation matrix with very different mathematical properties from the global response matrix. Inference based on these symmetric measures (using equation (5) and equations S11–S13 of ref. 1) results in networks where the local response matrix is always symmetrizable (Supplementary Note 3). In these inferred networks, the absence of a direct connection from node j to node i (Sij = 0) inevitably implies that there is no direct connection in the reverse direction (node i to node j, Sji = 0). Thus, the inferred networks cannot have a one-way connection between any two nodes. This approach violates the reality of cellular networks where one-way connections dominate. For example, ubiquitous post-translational protein modifications are typically one-way connections. When a kinase phosphorylates a substrate, the substrate usually does not phosphorylate the kinase. Equally important, the symmetrizable local response matrix implies that the overall signal amplification or attenuation along a circular path (for instance, formed by a feedback loop) is exactly the same as in the reverse direction along this path, which is also biologically unrealistic. Therefore, the local response coefficients inferred from the correlation or mutual information matrices1 instead of the global response matrices do not represent real cellular networks and predict erroneous network connections (Supplementary Note 3).

There are many established methods that use statistical similarity measures to identify connections between network nodes, including lasso regression-based methods32,33,34, the partial correlation method35,36, Gaussian graphical models37 and elastic nets38, which rank the predicted edges on the basis of correlations between experimental observations or regression coefficients, rather than incorrectly using MRA equations to express the local connections through the correlation or mutual information matrices.

Barzel & Barabási claim that their method is robust against noise1. However, neither the Barzel & Barabási algorithm nor the standard MRA2 take explicit precautions against extrinsic or intrinsic noise in the data (Supplementary Note 4). Several statistical reformulations of MRA have been developed to allow robust inference of network topology in the presence of noise. For example, the statistical adaptations of MRA based on the Monte Carlo31 and maximum likelihood9,10 methods were successfully applied to infer signaling pathway topologies from noisy perturbation data in mammalian cells. A recent Bayesian MRA reformulation is also capable of inferring networks from noisy and even incomplete data sets39.

We are also unconvinced by the claim that the Barzel & Barabási method “improves upon the top-performing inference methods”1 in the DREAM5 network inference challenge (Network 3, http://wiki.c2b2.columbia.edu/dream/data/scripts/DREAM5/files/DREAM5_NetworkInference_Evaluation.zip). The DREAM5 data set contains the expression levels of 4,511 Escherichia coli genes, including 334 known transcription factors, and contestants were asked to rank the likelihood of interactions between transcription factors and target genes. Performances of the contenders were then estimated against a gold standard network, which involved only 141 out of 334 transcription factors. After we were unable to reproduce the results of Barzel & Barabási, the authors provided us with the source code for their algorithm. We found three issues in their analysis that rendered the comparison of their algorithm performance with the performances of the DREAM5 contestants invalid (Supplementary Note 5). First, although the identity of the 141 transcription factors of the gold standard network on which the performance was evaluated was unknown to the contestants, Barzel & Barabási exploited this information to zero out the correlations involving the other 193 transcription factors in their correlation matrix (used as a proxy for matrix G). Second, Barzel & Barabási incorrectly calculated the receiver operating characteristics (ROC) curves that estimated the performance of their algorithm (Supplementary Note 5). The DREAM5 challenge gold standard network contains only 141 transcription factors and 1,080 out of 4,511 potential target genes, whereas the interactions involving the remaining 3,431 genes are neither included in the gold standard network nor considered for the performance evaluation in the DREAM5 challenge (http://wiki.c2b2.columbia.edu/dream/data/scripts/DREAM5/files/DREAM5_NetworkInference_Evaluation.zip). Even so, Barzel & Barabási erroneously count these omitted interactions (between the 141 transcription factors and the remaining 3,431 genes) as true negatives, resulting in the inflated area under the ROC curve (AUROC) estimate (Supplementary Note 5). Finally, to evaluate performance of their algorithm, Barzel & Barabási disregarded the precision recall score (AUPR) and used only the AUROC score, which is known to be insufficient and can be misleading when the numbers of true positive (the interactions present in the gold standard network) and true negatives (the interactions absent in the gold standard network) differ significantly40. In the E. coli network, true negatives are about 100-fold more abundant than true positives. Consequently in the DREAM challenge, the performances were estimated using scores that combined both AUROC and AUPR. After we corrected these errors (Supplementary Note 5) and properly recalculated AUROC and AUPR using the evaluation script of the DREAM5 challenge (http://wiki.c2b2.columbia.edu/dream/data/scripts/DREAM5/files/DREAM5_NetworkInference_Evaluation.zip), the Barzel & Barabási inference algorithm performed poorly compared with the best performers in the DREAM5 competition (using either AUROC or AUPR as criteria; Supplementary Note 5; supplementary materials are also available on GitHub and figshare: http://figshare.com/articles/NBT_correspondence/1356170 GitHub: https://github.com/SBIUCD/NBT_correspondence.git). Specifically, the performance of the Barzel & Barabási algorithm estimated by AUPR ranks between 20th and 28th, and using AUROC it ranks between 3rd and 14th of the 29 participants, depending on whether the algorithm was applied to the Spearman correlation, Pearson correlation or Mutual Information. Predictions, in which we used merely 'raw' statistical similarity measures as substitutes of the local connections, ranked better (AUPR ranked between 7th and 13th, and AUROC ranks between 2nd and 10th depending on the statistical similarity measure used) than their algorithm (Fig. 1).

Figure 1: The DREAM5 challenge performances of the Barzel & Barabási 'silencer' method (dark blue) and the raw statistical similarity measures, the Pearson (red) and Spearman (light blue) correlations and mutual information (MI, green).
figure1

As in the original DREAM5 challenge, the performance was estimated using two scores: (a–c) AUROC. (d–f) AUPR. TPR, true-positive rate; FPR, false-positive rate; 'Precision' indicates the fraction of correctly inferred true interactions; 'Recall' equals TPR. Insets show the AUROC and AUPR scores for the Barzel & Barabási algorithm, the Pearson and Spearman correlations, and mutual information. In all three cases, the performance of the Barzel & Barabási algorithm was lower than the performance of the raw similarity measures alone (note that in Fig. 3 of the original publication1, the Barzel & Barabási algorithm was inappropriately applied and scored).

In summary, the concerns raised in this Correspondence cast doubt both on the level of conceptual advance and the practical usefulness of the silencing method proposed in the Barzel & Barabási study1. Ironically, many of the practical issues could have been remedied by consulting the extensive published literature describing MRA and MRA-based methods, which Barzel & Barabási unfortunately seem to have overlooked.

References

  1. 1

    Barzel, B. & Barabási, A.L. Nat. Biotechnol. 31, 720–725 (2013).

  2. 2

    Kholodenko, B.N. et al. Proc. Natl. Acad. Sci. USA 99, 12841–12846 (2002).

  3. 3

    Stark, J., Callard, R. & Hubank, M. Trends Biotechnol. 21, 290–293 (2003).

  4. 4

    Kholodenko, B.N. Nat. Cell Biol. 9, 247–249 (2007).

  5. 5

    Sontag, E.D. Essays Biochem. 45, 161–176 (2008).

  6. 6

    Selvarajoo, K., Tomita, M. & Tsuchiya, M. J. Bioinform. Comput. Biol. 7, 243–268 (2009).

  7. 7

    Kholodenko, B., Yaffe, M.B. & Kolch, W. Sci. Signal. 5, re1 (2012).

  8. 8

    de la Fuente, A., Brazhnik, P. & Mendes, P. in Metabolic Engineering in the Post Genomic Era (eds. Kholodenko, B.N. & Westerhoff, H.V.) (Horizon Bioscience, 2004).

  9. 9

    Stelniec-Klotz, I. et al. Mol. Syst. Biol. 8, 601 (2012).

  10. 10

    Klinger, B. et al. Mol. Syst. Biol. 9, 673 (2013).

  11. 11

    de la Fuente, A., Brazhnik, P. & Mendes, P. Trends Genet. 18, 395–398 (2002).

  12. 12

    Gardner, T.S., di Bernardo, D., Lorenz, D. & Collins, J.J. Science 301, 102–105 (2003).

  13. 13

    de la Fuente, A. & Makhecha, D.P. Syst. Biol. (Stevenage) 153, 257–262 (2006).

  14. 14

    Bansal, M., Belcastro, V., Ambesi-Impiombato, A. & di Bernardo, D. Mol. Syst. Biol. 3, 78 (2007).

  15. 15

    Scheinine, A. et al. Ann. NY Acad. Sci. 1158, 287–301 (2009).

  16. 16

    Shimoni, Y., Fink, M.Y., Choi, S.G. & Sealfon, S.C. PLoS Comput. Biol. 6, e1000828 (2010).

  17. 17

    Kholodenko, B.N. & Sontag, E.D. Determination of functional network structure from local parameter dependence data. arXiv: physics/0205003 (2002).

  18. 18

    Sontag, E., Kiyatkin, A. & Kholodenko, B.N. Bioinformatics 20, 1877–1886 (2004).

  19. 19

    Andrec, M., Kholodenko, B.N., Levy, R.M. & Sontag, E. J. Theor. Biol. 232, 427–441 (2005).

  20. 20

    Brown, G.C., Hoek, J.B. & Kholodenko, B.N. Trends Biochem. Sci. 22, 288 (1997).

  21. 21

    Ferrell, J.E. Jr. Trends Biochem. Sci. 22, 288–289 (1997).

  22. 22

    Kholodenko, B.N., Hoek, J.B., Westerhoff, H.V. & Brown, G.C. FEBS Lett. 414, 430–434 (1997).

  23. 23

    Soucek, T., Pusch, O., Hengstschlager-Ottnad, E., Adams, P.D. & Hengstschlager, M. Oncogene 14, 2251–2257 (1997).

  24. 24

    Pomerening, J.R., Sontag, E.D. & Ferrell, J.E. Nat. Cell Biol. 5, 346–351 (2003).

  25. 25

    Holt, L.J., Krutchinsky, A.N. & Morgan, D.O. Nature 454, 353–357 (2008).

  26. 26

    Trunnell, N.B., Poon, A.C., Kim, S.Y. & Ferrell, J.E. Jr. Mol. Cell 41, 263–274 (2011).

  27. 27

    Bagowski, C.P., Besser, J., Frey, C.R. & Ferrell, J.E. Curr. Biol. 13, 315–320 (2003).

  28. 28

    Schachter, K.A., Du, Y., Lin, A. & Gallo, K.A. J. Biol. Chem. 281, 19134–19144 (2006).

  29. 29

    Alon, U. Nat. Rev. Genet. 8, 450–461 (2007).

  30. 30

    Kholodenko, B.N. Nat. Rev. Mol. Cell Biol. 7, 165–176 (2006).

  31. 31

    Santos, S.D., Verveer, P.J. & Bastiaens, P.I. Nat. Cell Biol. 9, 324–330 (2007).

  32. 32

    Tibshirani, R. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996).

  33. 33

    Shimamura, T., Imoto, S., Yamaguchi, R. & Miyano, S. Genome Inform. 19, 142–153 (2007).

  34. 34

    Friedman, J., Hastie, T. & Tibshirani, R. Biostatistics 9, 432–441 (2008).

  35. 35

    de la Fuente, A., Bing, N., Hoeschele, I. & Mendes, P. Bioinformatics 20, 3565–3574 (2004).

  36. 36

    Veiga, D.F., Vicente, F.F., Grivet, M., de la Fuente, A. & Vasconcelos, A.T. Genet. Mol. Res. 6, 730–742 (2007).

  37. 37

    Friedman, N. Science 303, 799–805 (2004).

  38. 38

    Zou, H. & Hastie, T. J. R. Stat. Soc. Series B Stat. Methodol. 67, 301–320 (2005).

  39. 39

    Santra, T., Kolch, W. & Kholodenko, B.N. BMC Syst. Biol. 7, 57 (2013).

  40. 40

    Davis, J. & Goadrich, M. in Proceedings of the 23rd International Conference on Machine Learning, 233–240 (ACM, Pittsburgh, Pennsylvania, 2006).

Download references

Author information

Correspondence to Boris N Kholodenko.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bastiaens, P., Birtwistle, M., Blüthgen, N. et al. Silence on the relevant literature and errors in implementation. Nat Biotechnol 33, 336–339 (2015). https://doi.org/10.1038/nbt.3185

Download citation

Share this article

Anyone you share the following link with will be able to read this content:

Further reading