Global alignment of multiple protein interaction networks with application to functional orthology detection
See allHide authors and affiliations
-
Communicated by Silvio Micali, Massachusetts Institute of Technology, Cambridge, MA, July 16, 2008 (received for review March 12, 2008)
Abstract
Protein–protein interactions (PPIs) and their networks play a central role in all biological processes. Akin to the complete sequencing of genomes and their comparative analysis, complete descriptions of interactomes and their comparative analysis is fundamental to a deeper understanding of biological processes. A first step in such an analysis is to align two or more PPI networks. Here, we introduce an algorithm, IsoRank, for global alignment of multiple PPI networks. The guiding intuition here is that a protein in one PPI network is a good match for a protein in another network if their respective sequences and neighborhood topologies are a good match. We encode this intuition as an eigenvalue problem in a manner analogous to Google's PageRank method. Using IsoRank, we compute a global alignment of the Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens PPI networks. We demonstrate that incorporating PPI data in ortholog prediction results in improvements over existing sequence-only approaches and over predictions from local alignments of the yeast and fly networks. Previous methods have been effective at identifying conserved, localized network patterns across pairs of networks. This work takes the further step of performing a global alignment of multiple PPI networks. It simultaneously uses sequence similarity and network data and, unlike previous approaches, explicitly models the tradeoff inherent in combining them. We expect IsoRank—with its simultaneous handling of node similarity and network similarity—to be applicable across many scientific domains.
Footnotes
- §To whom correspondence should be addressed. E-mail: bab@mit.edu
-
Preliminary versions of this work were presented at the RECOMB 2007, PSB 2008, and SODA 2008 Conferences.
-
Author contributions: R.S. and B.B. designed research; R.S. and B.B. performed research; R.S. and J.X. contributed new reagents/analytic tools; R.S. analyzed data; and R.S., J.X., and B.B. wrote the paper.
-
The authors declare no conflict of interest.
-
This article is a PNAS Direct Submission.
-
This article contains supporting information online at www.pnas.org/cgi/content/full/0806627105/DCSupplemental.
-
¶ In the absence of any sequence similarity information, the optimum mapping will correspond to a maximum common subgraph (MCS) between G1 and G2 (i.e., the largest graph that is isomorphic to subgraphs of both) and the corresponding node-mapping such that each node is mapped to at most one node in the other network. Nodes not mapped to any other node are referred to as gap nodes. MCS is an NP-complete problem and thus approximate solutions, especially for the large-sized PPI networks, are essential. Also, when incorporating sequence data, the global alignment problem is no longer a pure MCS problem.
-
↵‖ Flannick J, et al., Twelfth Annual International Conference on Research in Computational Molecular Biology, March 30–April 2, 2008, Singapore.
-
↵** Maxim Kalaev M, Bafna V, Sharan R, Twelfth Annual International Conference on Research in Computational Molecular Biology, March 30–April 2, 2008, Singapore.
-
Freely available online through the PNAS open access option.
- © 2008 by The National Academy of Sciences of the USA