Download:
File size:
1831 kb
Format:
application/pdf
Author:
Ekeberg, Magnus (KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics)
Title:
Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach
Department:
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics
Publication type:
Student thesis
Language:
English
Level:
Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE credits
Educational program:
Master of Science in Engineering -Engineering Physics
Undergraduate subject:
Mathematical Statistics
Uppsok:
Physics, Chemistry, Mathematics
Pages:
57
Series:
Trita-MAT, ISSN 1401-2286; 14
Year of publ.:
2012
URI:
urn:nbn:se:kth:diva-99181
Permanent link:
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-99181
Subject category:
Probability Theory and Statistics
Abstract(en) :

Abstract 

Spatially proximate amino acid positions in a protein tend to co-evolve, so a protein's 3D-structure leaves an echo of correlations in the evolutionary record. Reverse engineering 3D-structures from such correlations is an open problem in structural biology, pursued with increasing vigor as new protein sequences continue to fill the data banks. Within this task lies a statistical stumbling block, rooted in the following: correlation between two amino acid positions can arise from firsthand interaction, but also be network-propagated via intermediate positions; observed correlation is not enough to guarantee proximity. The remedy, and the focus of this thesis, is to mathematically untangle the crisscross of correlations and extract direct interactions, which enables a clean depiction of co-evolution among the positions.

Recently, analysts have used maximum-entropy modeling to recast this cause-and-effect puzzle as parameter learning in a Potts model (a kind of Markov random field). Unfortunately, a computationally expensive partition function puts this out of reach of straightforward maximum-likelihood estimation. Mean-field approximations have been used, but an arsenal of other approximate schemes exists. In this work, we re-implement an existing contact-detection procedure and replace its mean-field calculations with pseudo-likelihood maximization. We then feed both routines real protein data and highlight differences between their respective outputs. Our new program seems to offer a systematic boost in detection accuracy.

Supervisor:
Aurell, Erik, Professor (KTH, School of Computer Science and Communication (CSC), Computational Biology, CB)
Examiner:
Koski, Timo (KTH, School of Engineering Sciences (SCI), Mathematics (Dept.))
Available from:
2012-09-21
Created:
2012-07-17
Last updated:
2012-09-21
Statistics:
129 hits
FILE INFORMATION
File size:
1831 kb
Mimetype:
application/pdf
Type:
fulltext
Statistics:
381 hits