Uncovering Protein Structure from Genomic Data

Date
2022-04-26
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Reliable three-dimensional structures of proteins serve as a necessary starting point for a mechanistic understanding of how those proteins function in living systems. Yet only a small fraction of known proteins also have experimentally determined structures, necessitating the development of structure prediction algorithms, especially in cases where there is limited or no experimentally determined structures available. In particular, it has been shown that information encoded in amino acid sequence data can directly be used to infer structural contacts in a folded protein or protein complex that have been preserved over natural selection. Throughout evolution, favorable random amino acid mutations that were selected for in sequence space have shaped the protein’s functional structure, connecting the sequence and folding landscapes through a process called amino acid coevolution. In particular, statistical methodologies such as the Direct Coupling Analysis (DCA) approach have been used to quantify the amount of amino acid coevolution between residue pairs, allowing for the inference of spatial proximity between these pairs. Here, we refine the structure-prediction approach using DCA for the prediction of dimer interfaces and higher-order protein-protein interactions. We develop a measure of statistical significance for DCA predictions based on the Z-score, allowing for high quality predictions to be distinguished from noisy predictions. We also explore the number of protein sequences necessary to make accurate predictions of spatial contacts in a folded protein, e.g., how much sequence information is necessary to reliably make predictions using DCA? Finally, we will conclude with a discussion of some of the applications of DCA to specific systems, such as the prediction of the actin fiber.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
protein folding, sequence analysis, protein structure prediction
Citation

Mehrabiani, Kareem M. "Uncovering Protein Structure from Genomic Data." (2022) Diss., Rice University. https://hdl.handle.net/1911/113494.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page