Uncovering Protein Structure from Genomic Data

dc.contributor.advisorOnuchic, José Nen_US
dc.creatorMehrabiani, Kareem Men_US
dc.date.accessioned2022-10-05T18:26:05Zen_US
dc.date.available2022-10-05T18:26:05Zen_US
dc.date.created2022-05en_US
dc.date.issued2022-04-26en_US
dc.date.submittedMay 2022en_US
dc.date.updated2022-10-05T18:26:05Zen_US
dc.description.abstractReliable three-dimensional structures of proteins serve as a necessary starting point for a mechanistic understanding of how those proteins function in living systems. Yet only a small fraction of known proteins also have experimentally determined structures, necessitating the development of structure prediction algorithms, especially in cases where there is limited or no experimentally determined structures available. In particular, it has been shown that information encoded in amino acid sequence data can directly be used to infer structural contacts in a folded protein or protein complex that have been preserved over natural selection. Throughout evolution, favorable random amino acid mutations that were selected for in sequence space have shaped the protein’s functional structure, connecting the sequence and folding landscapes through a process called amino acid coevolution. In particular, statistical methodologies such as the Direct Coupling Analysis (DCA) approach have been used to quantify the amount of amino acid coevolution between residue pairs, allowing for the inference of spatial proximity between these pairs. Here, we refine the structure-prediction approach using DCA for the prediction of dimer interfaces and higher-order protein-protein interactions. We develop a measure of statistical significance for DCA predictions based on the Z-score, allowing for high quality predictions to be distinguished from noisy predictions. We also explore the number of protein sequences necessary to make accurate predictions of spatial contacts in a folded protein, e.g., how much sequence information is necessary to reliably make predictions using DCA? Finally, we will conclude with a discussion of some of the applications of DCA to specific systems, such as the prediction of the actin fiber.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationMehrabiani, Kareem M. "Uncovering Protein Structure from Genomic Data." (2022) Diss., Rice University. <a href="https://hdl.handle.net/1911/113494">https://hdl.handle.net/1911/113494</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/113494en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectprotein foldingen_US
dc.subjectsequence analysisen_US
dc.subjectprotein structure predictionen_US
dc.titleUncovering Protein Structure from Genomic Dataen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentSystems, Synthetic and Physical Biologyen_US
thesis.degree.disciplineNatural Sciencesen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MEHRABIANI-DOCUMENT-2022.pdf
Size:
2.94 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: