Skip to main content
bioRxiv
  • Home
  • Submit
  • FAQ
  • Blog
  • ALERTS / RSS
  • About
  • Channels
    Advanced Search
    bioRxiv posts many COVID19-related papers. A reminder: they have not been formally peer-reviewed and should not guide health-related behavior or be reported in the press as conclusive.
    New Results Follow this preprint

    Deep Generative Models of Protein Structure Uncover Distant Relationships Across a Continuous Fold Space

    View ORCID ProfileEli J. Draizen, View ORCID ProfileStella Veretnik, View ORCID ProfileCameron Mura, View ORCID ProfilePhilip E. Bourne
    doi: https://doi.org/10.1101/2022.07.29.501943
    This article is a preprint and has not been certified by peer review [what does this mean?].
    000000130 Comments0 TRiP Peer Reviews0 Community Reviews0 Automated Evaluations0 Blog/Media Links0 Videos13 Tweets
    • Abstract
    • Full Text
    • Info/History
    • Metrics
    • Supplementary material
    • Data/Code
    • Preview PDF

    Abstract

    Unresolved questions about the discrete/continuous dichotomy of protein fold space permeate structural and evolutionary biology. From protein structure comparison and classification to evolutionary analyses and function prediction, our views of fold space implicitly rest upon many assumptions that impact how we analyze, interpret and come to understand biological systems. Discrete views of fold space categorize similar folds into separate groups; unfortunately, such a ‘binning’ process inherently fails to capture many remote relationships. While hierarchical databases such as CATH, SCOP, and ECOD represent major steps forward in protein classification, we believe that a scalable, objective and conceptually flexible method that is less reliant upon assumptions and heuristics could enable a more systematic and thorough exploration of fold space and evolutionary-distant relationships. Here, we develop a structure-guided, comparative analysis of proteins, leveraging embeddings derived from deep generative models, which represent a highly-compressed, lower-dimensional space of a given protein and its sequence, structure and biophysical properties. Building upon a recent ‘Urfold’ model of protein structure, the deep generative approach developed here, termed ‘DeepUrfold’, suggests a new, mostly-continuous view of fold space—a view that extends beyond simple 3D structural/geometric similarity, towards the realm of integrated sequence↔structure↔function properties. We find that such an approach can quantitatively represent and detect evolutionarily-remote relationships that are not captured by existing methods.

    Competing Interest Statement

    The authors have declared no competing interest.

    Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
    Back to top
    PreviousNext
    Posted August 01, 2022.
    Download PDF
    Print/Save Options

    Supplementary Material

    Data/Code
    Email
    Share
    Citation Tools
    • Tweet Widget
    COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (3817)
    • Biochemistry (8107)
    • Bioengineering (5900)
    • Bioinformatics (21853)
    • Biophysics (10933)
    • Cancer Biology (8492)
    • Cell Biology (12325)
    • Clinical Trials* (138)
    • Developmental Biology (6992)
    • Ecology (10692)
    • Epidemiology* (2065)
    • Evolutionary Biology (14271)
    • Genetics (9905)
    • Genomics (13321)
    • Immunology (8448)
    • Microbiology (20661)
    • Molecular Biology (8147)
    • Neuroscience (44285)
    • Paleontology (329)
    • Pathology (1324)
    • Pharmacology and Toxicology (2329)
    • Physiology (3476)
    • Plant Biology (7452)
    • Scientific Communication and Education (1349)
    • Synthetic Biology (2068)
    • Systems Biology (5670)
    • Zoology (1161)
    * The Clinical Trials and Epidemiology subject categories are now closed to new submissions following the completion of bioRxiv's clinical research pilot project and launch of the dedicated health sciences server medRxiv (submit.medrxiv.org). New papers that report results of Clinical Trials must now be submitted to medRxiv. Most new Epidemiology papers also should be submitted to medRxiv, but if a paper contains no health-related information, authors may choose to submit it to another bioRxiv subject category (e.g., Genetics or Microbiology).

    Evaluation/discussion of this paper   x

    We use cookies on this site to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies.