Evolutionary Fitness of Non-Coding Genetic Elements

Date
2024-04-19
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Proper protein structure and function are integral to cellular homeostasis. The wide array of known natural protein sequences are a product of millions of years of evolutionary pressure maintaining physical stability and biological function. The evolutionary process occurs via random instances of structural and sequence variations in the genome. While typically neutral in protein-coding genes, such variations can result in the loss of function of a protein-coding gene or produce a novel protein-coding gene. A former protein-coding gene can behave as a reservoir for novel protein-coding genes or variants of known proteins. This dissertation features work that examines the evolutionary fitness of two classes of genetic elements, pseudogenes and exons, that may encode functional amino acid sequences.

In Chapter 1 of this dissertation, we introduce the concepts and tools employed in later chapters. We provide a conceptual overview of pseudogenes and exons, as well as review past works that examine the physical stabilities of their encoded amino acid sequences. We also discuss the energy landscape theory and the physical energy function-- the Associative Memory, Water Mediated, Structure and Energy Model (AWSEM)--informed by the theory's principles. We finally discuss the Direct Coupling Analysis (DCA) model, which, when used alongside the AWSEM Hamiltonian, provides information on the physical stability and biological function of a protein sequence.

In Chapter 2 of this dissertation, we present work characterizing the physical and evolutionary energy landscapes of pseudogenes, former protein coding genes found in many eukaryotes that cannot be translated due to debilitating mutations. Given these genetic elements previously experienced selection pressure to fold, pseudogenes are an intriguing example of protein devolution. We systematically studied pseudogenes associated with an array of proteins varying in biological function and size. We found that, if translated, pseudogene sequences are typically destabilized relative to their former native state as a function of evolutionary time. Pseudogene sequences that inversely become more physically stable as a result of their mutations have diminished or altered functional abilities that may result in pathological conditions.

In Chapter 3 of this dissertation, we present work that evaluates the physical energy landscapes of exons, genetic elements that encode amino acid sequences in eukaryotic genes. If exons encode independently foldable structural units, naturally occurring or engineered exon shuffling can quickly produce novel protein coding genes. Using publicly available databases of annotated protein sequences and gene structures, we identify conserved exons in multiple protein families. We find that conserved exons tend to be minimally frustrated, with these exons' boundaries coinciding with secondary structural element boundaries. Our findings support previous works suggesting exons can encode physically stable protein segments.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Pseudogenes, Energy Landscapes, Exons, Foldons, Frustration, Information Theory
Citation

Jaafari, Hana. Evolutionary Fitness of Non-Coding Genetic Elements. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116174

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page