Elsevier

Automatica

Volume 47, Issue 6, June 2011, Pages 1230-1235
Automatica

Brief paper
Robust dynamical network structure reconstruction

https://doi.org/10.1016/j.automatica.2011.03.008Get rights and content

Abstract

This paper addresses the problem of network reconstruction from data. Previous work identified necessary and sufficient conditions for network reconstruction of LTI systems, assuming perfect measurements (no noise) and perfect system identification. This paper assumes that the conditions for network reconstruction have been met but here we additionally take into account noise and unmodelled dynamics (including nonlinearities). In order to identify the network structure that generated the data, we compute the smallest distances between the measured data and the data that would have been generated by particular network structures. We conclude with biologically inspired network reconstruction examples which include noise and nonlinearities.

Introduction

One of the fundamental interests in systems biology is the discovery of the specific biochemical mechanisms that explain the observed behaviour of a particular biological system. In particular, we consider the problem of reconstructing the network structure from input and partially measured output data of a dynamical system, and in turn uncovering the underlying mechanisms responsible for the observed behaviour. The biological network reconstruction problem challenges come from the necessity to deal with noisy and partial measurements (in particular, the number of hidden/unobservable nodes and their position in the network is unknown) taken from a nonlinear and stochastic dynamical network.

There are several tools in the literature to infer causal network structures. These tools are mainly rooted in three fields: Bayesian inference (Dojer et al., 2006, Yu et al., 2004), information theory (ARACNe Basso et al., 2005, Butte and Kohane, 2000, Faith et al., 2007) and ODE methods (inferelator Bansal and di Bernardo, 2007, Bonneau et al., 2006, di Bernardo et al., 2005, Gardner et al., 2003, Sontag, 2008). Details on these and other methods can be found in several reviews of the field such as Bansal, Belcastro, Ambesi-Impiombato, and Bernardo (2007), Cantone et al. (2009), De Smet and Marchal (2010), Hecker, Lambeck, Toepfer, Someren, and Guthke (2009) and Koyuturk (2010). The vast majority of network reconstruction methods produce estimates of network structure regardless of the informativity of the underlying data. In particular, most methods produce estimates of network structure even in cases with data from only a few experiments. Such data may not contain enough information to enable the accurate reconstruction of the actual network, thus the obtained network estimates can be arbitrarily different from the true network structure (Cantone et al., 2009). To compensate for the lack of information in data, most methods have heuristics that try to “guess” at the remaining information, either by specifying prior distributions or by appealing to a priori beliefs about the nature of real biological networks, such as looking for the sparsest network. Nevertheless, these heuristics bias the results and lead to incorrect estimates of the network structure.

In contrast, our approach has been to identify the conditions when data is sufficiently informative to enable accurate network reconstruction. The results indicate that even in an ideal situation, when the underlying network is linear and time-invariant (LTI) and the measurements are noise-free, network reconstruction is impossible without additional information (Gonçalves & Warnick, 2008). Surprisingly, this information gap is not due to a lack of data, or a deficiency in the number of experiments, but rather it occurs because system states are only partially observed; the information gap is present in all data sets except those that satisfy certain experimental conditions. Our analysis identified a particular experimental protocol that satisfies these necessary conditions to ensure that data will be sufficiently informative to enable network reconstruction. This protocol suggests the following.

1.

A network composed of p measured species demands p experiments.

2.

Each experiment requires a distinct input that independently controls a measured species, i.e. experimental input i must affect measured species i and no other measured species except, possibly, indirectly through measured species i.

If data acquisition experiments are not performed in this (or an equivalent) way, the network cannot be reconstructed. Moreover, the resulting information gap is catastrophic, meaning that any internal network structure explains the data equally well (i.e. fully decoupled, fully connected, and everything in between). On the other hand, if some information about the network is available a priori, as is usually the case, then these conditions can be relaxed as explained in Gonçalves and Warnick (2008).

The work in Gonçalves and Warnick (2008), however, did not take into account the realistic scenario that typically systems are nonlinear and data are noisy. This paper extends and details earlier results in Gonçalves and Warnick (2009) by developing an effective method to reconstruct networks in the presence of noise and nonlinearities, assuming that the conditions for network reconstruction presented above in (1) and (2) have been met. Steady-state (resp. time-series) data can be used to reconstruct the Boolean (resp. dynamical) network structure of the system.

The paper is organised as follows. After a motivating example showing that input–output data alone does not enable network reconstruction, Section 2 reviews dynamical structure functions and gives fundamental results concerning their usefulness in the network reconstruction problem. Section 3 presents the main results of the paper regarding robust network reconstruction from input–output data subject to noise and nonlinearities. Finally, we conclude the paper with biologically inspired examples in Section 4.

Notation. For a matrix ACM×N,AijC denotes the element in the ith row and jth column while AjCM×1 denotes its jth column. For a column vector α,α[i] denotes its ith element. We define erT=[0,,0,1rth,0,,0]R1×N. I denotes the identity matrix. When it is clear from the context, we omit the explicit dependence of transfer functions on the Laplace variable s, e.g. we write G instead of G(s).

Motivating example. Consider the transfer function G(s)=1s+3[1s+11s+2] obtained from data (partial observations) using system identification tools. For simplicity, assume that G(s) accurately represents the input–output relation of the original system. This transfer function is consistent with two state-space realisations ẋ=Ax+Bu,y=Cx given by A1=[101021003],A2=[211131011],B1=B2=[001]T, and C1=C2=[I0]R2×3 (i.e., the third state is hidden/non-observable). Note that both realisations are minimal and correspond to very different network structures as seen in Fig. 1. This demonstrates that even in the idealised setting (LTI system, no noise and perfect system identification), network reconstruction in the presence of hidden/unobservable states is not possible without additional information about the system.

Section snippets

Dynamical structure functions and network reconstruction

In Gonçalves and Warnick (2008) we introduced the notion of dynamical structure functions and showed how they can be used to obtain necessary and sufficient conditions for network reconstruction. For the sake of clarity and completeness, we state these previously obtained results here without proofs. We refer the interested reader to Gonçalves and Warnick (2008) and Yuan, Stan, Warnick, and Goncalves (2009) for the corresponding proofs.

Consider a nonlinear system x̄̇=f(x̄,ū,w1),ȳ=h(x̄,w2)

Robust network structure reconstruction

In this section, we consider the problem of robustly reconstructing dynamical network structures. Data are obtained from input–output measurements of a noisy nonlinear system. From this type of data we aim to find the internal network structure Q associated with the linearised system (2). To average out the noise, data-collection experiments are repeated N times. For simplicity of exposition, we assume that no a priori information on the internal network structure Q is available. The results

Biologically inspired examples

This section illustrates with two examples the theoretical results presented in the previous section. The corresponding sets of ordinary differential equation describing the dynamics of the considered networks are used to generate noisy data, which are then fed to our reconstruction algorithm in order to assess its ability to recover the correct network structure.

Conclusion and discussion

This paper proposes a new network reconstruction method in the presence of noise and nonlinearities based on dynamical structure functions. The key idea is to find minimal distances between the existent data and the data required to obtain particular network structures. The method was illustrated with two biologically oriented examples. They showed that even in the presence of nonlinearities and considerable noise network reconstruction was possible. Eventually, when the signal-to-noise ratio

Ye Yuan was born on October 1986. He received his B.Eng. degree (Valedictorian) from the Department of Automation, Shanghai Jiao Tong University in 2008, M. Phil. from the Department of Engineering, Cambridge University in 2009. He is currently a second-year Ph.D. student in Control Group, Department of Engineering, University of Cambridge. Ye was a visiting student at University of New Mexico, the Hong Kong University of Science and Technology, and Luxembourg Centre for Systems Biomedicine. He

References (27)

  • I. Cantone et al.

    A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches

    Cell

    (2009)
  • K. Zhou et al.

    Robust and optimal control

    (1996)
  • Yuan, Y., Stan, G., Warnick, S., & Goncalves, J. (2010). Robust dynamical network reconstruction from noisy data. In...
  • Yuan, Y., Stan, G., Warnick, S., & Goncalves, J. (2009). Minimal dynamical structure realisations with application to...
  • J. Yu et al.

    Advances to Bayesian network inference for generating causal networks from observational biological data

    Bioinformatics

    (2004)
  • N. Young

    An introduction to Hilbert space

    (1988)
  • G. Wadhams et al.

    Making sense of it all: bacterial chemotaxis

    Nature Reviews Molecular Cell Biology

    (2004)
  • E. Sontag

    Network reconstruction based on steady-state data

    Essays in Biochemistry

    (2008)
  • M. Roberts et al.

    A model invalidation-based approach for elucidating biological signalling pathways, applied to the chemotaxis pathway in R. sphaeroides

    BMC Systems Biology

    (2009)
  • T. Nordling et al.

    Interampatteness—a generic property of biochemical networks

    IET Systems Biology

    (2009)

Cited by (113)

View all citing articles on Scopus

Ye Yuan was born on October 1986. He received his B.Eng. degree (Valedictorian) from the Department of Automation, Shanghai Jiao Tong University in 2008, M. Phil. from the Department of Engineering, Cambridge University in 2009. He is currently a second-year Ph.D. student in Control Group, Department of Engineering, University of Cambridge. Ye was a visiting student at University of New Mexico, the Hong Kong University of Science and Technology, and Luxembourg Centre for Systems Biomedicine. He is now a visiting student at CDS, Caltech. His research interest lies in the mathematical control theory with applications to network and biology. He is the recipient of Dorothy Hodgkin Postgraduate Awards (Microsoft Research Ph.D. Scholarship), Cambridge Overseas Scholarship and Henry Lester Scholarship.

Guy-Bart Stan was born in Liège, Belgium, in 1977. He received his Ph.D. degree in Applied Sciences (Analysis and Control of Nonlinear Dynamical Systems) from the University of Liège, Belgium in 2005. In 2005, Dr. Stan worked as a Senior Digital Signal Processing Engineer at Philips Applied Technologies, Leuven, Belgium. From 2006 until 2009, he worked as Research Associate in the Control Group of the Department of Engineering at the University of Cambridge, UK, being supported by an EU-FP6 IEF Marie-Curie Fellowship and the UK EPSRC, successively. In 2008, he was a Visiting Scientist at the Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, USA. Since 2009 Dr. Stan is a University Lecturer in Engineering Design for Synthetic Biology Systems in the Department of Bioengineering and the EPSRC-funded Centre for Synthetic Biology and Innovation at Imperial College London. His current research interests include the mathematical modelling, analysis and control of complex biological systems/networks occurring in synthetic biology, systems biology, and technological systems.

Sean Warnick received the B.S.E. degree from Arizona State University in 1993, and the S.M. and Ph.D. degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology in 1995 and 2003, respectively. He attended ASU on scholarship from the Flinn Foundation, graduated summa cum laude, and was named the Outstanding Graduate of the College of Engineering and Applied Sciences. Since 2003 Dr. Warnick has been with the Computer Science Department at Brigham Young University, where he is currently an Associate Professor and Director of the interdisciplinary Information and Decision Algorithms Laboratories. Dr. Warnick has also held visiting appointments at Cambridge University (Summer 2006), the University of Maryland (Summer 2008), and the National Security Agency, where he was named the Distinguished Visiting Professor for three consecutive years (2008–2010). Dr. Warnick’s research interests focus on complex networks of uncertain dynamical systems, where he considers issues of representation, identification and reconstruction, control, security and robustness, and verification and validation.

Jorge Goncalves received his Licenciatura (5-year S.B.) degree from the University of Porto, Portugal, and the M.S. and Ph.D. degrees from the Massachusetts Institute of Technology, Cambridge, MA, all in Electrical Engineering and Computer Science, in 1993, 1995, and 2000, respectively. He then held two postdoctoral positions, first at the Massachusetts Institute of Technology for seven months, and from May 2001 to March 2004 at the California Institute of Technology with the Control and Dynamical Systems division. Since April 2004 he has been a Lecturer in the Information Engineering Division of the Department of Engineering at the University of Cambridge. Since 2005 he has been also a Fellow of Pembroke College, Cambridge. He was a visiting researcher at the University of Luxembourg from June to December 2010. Also he is a visiting researcher at California Institute of Technology (January–September 2011). He was the recipient of the Best Student Paper Award at the Automatic Control Conference, Chicago, IL, June 2000.

His research interests include modelling, analysis and control of complex and hybrid systems. In particular, modelling and analysis in systems and synthetic biology, collaborating with biologists in different areas such as circadian rhythms and gene regulatory networks.

This work was supported in part by EPSRC grant numbers EP/G066477/1 and EP/I029753/1, AFRL FA8750-09-2-0219 and Microsoft Research through the Ph.D. Scholarship Program. The material in this paper was partially presented at the 49th IEEE Conference on Decision and Control, December 15–17, 2010, Atlanta, Georgia, USA. This paper was recommended for publication in revised form by Associate Editor Elling Jacobsen, under the direction of Guest Editor Francis J. Doyle III.

View full text