Uncovering Protein Structure from Genomic Data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Reliable three-dimensional structures of proteins serve as a necessary starting point for a mechanistic understanding of how those proteins function in living systems. Yet only a small fraction of known proteins also have experimentally determined structures, necessitating the development of structure prediction algorithms, especially in cases where there is limited or no experimentally determined structures available. In particular, it has been shown that information encoded in amino acid sequence data can directly be used to infer structural contacts in a folded protein or protein complex that have been preserved over natural selection. Throughout evolution, favorable random amino acid mutations that were selected for in sequence space have shaped the protein’s functional structure, connecting the sequence and folding landscapes through a process called amino acid coevolution. In particular, statistical methodologies such as the Direct Coupling Analysis (DCA) approach have been used to quantify the amount of amino acid coevolution between residue pairs, allowing for the inference of spatial proximity between these pairs. Here, we refine the structure-prediction approach using DCA for the prediction of dimer interfaces and higher-order protein-protein interactions. We develop a measure of statistical significance for DCA predictions based on the Z-score, allowing for high quality predictions to be distinguished from noisy predictions. We also explore the number of protein sequences necessary to make accurate predictions of spatial contacts in a folded protein, e.g., how much sequence information is necessary to reliably make predictions using DCA? Finally, we will conclude with a discussion of some of the applications of DCA to specific systems, such as the prediction of the actin fiber.
Description
Advisor
Degree
Type
Keywords
Citation
Mehrabiani, Kareem M. "Uncovering Protein Structure from Genomic Data." (2022) Diss., Rice University. https://hdl.handle.net/1911/113494.