Repository logo
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of R-3
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Nakhleh, Luay K"

Now showing 1 - 8 of 8
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Algorithms for Scalable Structural Analysis of Class I Peptide-MHC Systems
    (2020-04-22) Abella, Jayvee Ralph; Kavraki, Lydia E; Nakhleh, Luay K
    Peptide-MHC (pMHC) complexes are central components of the immune system, and understanding the mechanism behind stable pMHC binding will aid the development of immunotherapies. Stable pMHC binding can be assessed through an analysis of structure, which contain information on the atomic interactions present between peptide and MHC. However, a large-scale analysis of pMHCs is difficult to perform, due to the lack of available structures as well as fact that pMHCs are large molecular systems with slow timescales. This thesis presents a set of approaches developed to deliver scalable structural analysis of Class I pMHC systems. First, we present APE-Gen, a fast method for generating ensembles of bound pMHC conformations. Next, we present a structure-based classifier using random forests for predicting stable pMHC binding. Finally, we present a simulation framework for generating a Markov state model of the full binding dynamics for a given pMHC system using a combination of umbrella and adaptive sampling. This work pushes the capability of computational methods for the structural analysis of pMHCs, leading to structural insight that can guide the understanding of pMHC binding.
  • Loading...
    Thumbnail Image
    Item
    Evolution of Altruism and Eusociality: Toward a Cost/Benefit Analysis of Fitness and Genetic Relatedness
    (2014-06-13) Liao, Xiaoyun; Kohn, Michael H; Nakhleh, Luay K; Kimmel, Marek; Putnam, Nicholas H
    Altruism is a behavior that benefits others at a cost to one’s own ability of survival and/or reproduction; that is, individual fitness. Thus, altruism poses great challenges to Darwin’s theory of evolution by natural selection on individual fitness. Altruistic behaviors are commonly performed in eusocial animals, such as nearly all hymenoptera (including bees, wasps, and ants), termites, ambrosia beetles, and so on. Inclusive fitness theory predicts that altruistic behavior can evolve when sufficient fitness benefits are given to relatives even though individual fitness is reduced. A different modeling approach has led to a challenge to this theory. The modelers claim that relatedness is not causal, that eusocial behavior is very hard to evolve requiring more workers before the queen increased fitness, and that there is no conflict involved. Here I showed that, even within the terms of this modeling framework, inclusive fitness thinking leads to insights that completely change these conclusions. I showed that relatedness and inclusive fitness indeed are causal and that eusociality does evolve more readily. With regard to the latter this means eusociality can be favored under a lower benefits threshold. I concluded that multiple modeling approaches are useful and that efforts to synthesize them are better than asserting that one is universally better than the other. Moreover, either greenbeard effects or genetic kin recognition requires genetic polymorphisms as cues on which recognition is based. Previous models showed that selection eliminates rare cue alleles and a common allele gets fixed, i.e. altruism cannot persist. So it is unclear how genetic recognition for altruism persists under a Darwinian selection framework. Here, I designed a novel model with three types of genetic components (production, perception, and action). I analyzed my recognition model theoretically toward a cost/benefit analysis of fitness and genetic relatedness. I predicted the stability of recognition for altruism based on my model. Furthermore I tested my recognition model through various computational and biological simulations. My simulation results consistently showed altruism could maintain multiple recognition cues and be evolutionarily stable; given the assumptions of my model. I concluded that cost/benefit of fitness and genetic relatedness both play critical roles in the evolution of altruism and eusociality, and therefore can maintain the stability of recognition for altruism.
  • Loading...
    Thumbnail Image
    Item
    Integrated Likelihood for Phylogenomics under a No-Common-Mechanism Model
    (2019-04-18) Tidwell, Hunter; Nakhleh, Luay K
    The availability of genome-wide sequence data from many species and individuals within species has ushered in the era of phylogenomics. In this era, species phylogeny inference is based on models of sequence evolution on gene trees and models of gene tree evolution within species phylogenies. All existing inference methods, except parsimony, assume a common mechanism across loci as represented by unvarying branch lengths of the species phylogeny. In this thesis, we propose a ``no common mechanism" (NCM) model, in which the parameters of the species phylogeny may vary between loci. We derive an analytically integrated likelihood of species networks given gene trees from multiple loci under an NCM model. We demonstrate the performance of inference under this integrated likelihood on simulated and biological data. The model presented here will afford opportunities for exploring connections among various methods for estimating species phylogenies from multiple, independent loci.
  • Loading...
    Thumbnail Image
    Item
    Methods for Elucidating and Utilizing Local Phylogenies in Phylogenomics
    (2019-04-18) Elworth, Ryan Anthony Leo; Nakhleh, Luay K
    Understanding the evolutionary history of life on earth tells us about the origins of all life as well as giving insights into the underpinnings of human disease. While we continue to gather the DNA sequence data necessary to infer past evolutionary histories, the signal in this genomic data can be difficult to fully take advantage of. From a mathematical modeling perspective, the interplay of complex processes such as recombination, incomplete lineage sorting (ILS), and gene flow quickly complicate the generative process by which DNA sequences arise as a result of evolution. Computationally, the rapid growth in the generation of large amounts of sequence data necessitates efficient algorithms to infer past evolutionary histories. In particular, this thesis addresses the added challenges introduced by recombination breaking up genomes into localized regions whose evolutionary histories can disagree with one another. These localized evolutionary histories, known as local genealogies or local phylogenies, are interspersed throughout the genome in between regions affected by past recombination events. Local phylogenies can be difficult to infer in their own right. The signal for where recombination occurs and how the evolutionary histories of individual regions agree or disagree can be subtle. This same signal, however, can be used to infer important past events such as gene flow between species or even how genetic links to disease have evolved. In this thesis, I contribute to addressing these problems in the following ways. First, I introduce a new method for inferring local phylogenies at scale. This method leverages current state of the art tree building software to scan across a multiple sequence alignment and infer the localized evolutionary histories while simultaneously handling complications from recombination and low amounts of signal. Second, I introduce a new method for detecting when localized evolutionary histories were affected by past gene flow. For this work, I extend the theoretical framework of the D-statistic, used ubiquitously to scan for localized regions whose evolution was affected by gene flow, to handle arbitrarily complex gene flow scenarios with any number of sequences. Finally, I have disseminated the Automated Local Phylogenomic Analyses (ALPHA) toolkit with open source implementations of these methods as well as additional functionalities useful to biologists.
  • Loading...
    Thumbnail Image
    Item
    Phylogeny Inference in the Presence of Incomplete Lineage Sorting, Gene Duplication and Loss and Hybridization
    (2019-04-10) Du, Peng; Nakhleh, Luay K
    A species phylogeny captures how a set of extant species split and diverged from their most recent common ancestral species. A gene tree captures the evolutionary history of an individual gene or, more generally, non-recombining genomic region. A very complex relationship exists between the phylogeny of a set of species and the trees of genes in the genomes of those species. The complexity arises because of processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and hybridization, all of which can give rise to gene trees whose topologies disagree with each other as well as with that of the species phylogeny. Species phylogeny inference in the post-genomic era, also known as phylogenomic inference, requires developing models and methods that account for these processes in order to relate how individual loci (genomic regions) evolve within and across the branches of species phylogenies. For example, the multispecies coalescent (MSC) has been introduced to model ILS, and statistical species tree inference methods based on it have been developed. This model was later extended to allow for reticulation events (e.g., hybridization), and statistical methods for inferring phylogenetic networks were developed. Birth-death models of gene evolution have also been introduced to capture gene duplications and losses, and species tree inference methods that utilize them have been developed. In this thesis, I address two computational problems that arise in this domain. The first problem concerns the inference of species trees from multiple loci assuming that only ILS and GDL are at play, but not reticulation. The second problem concerns the inference of species (phylogenetic) networks from multiple loci when all three processes ILS, GDL, and reticulation are at play. My contribution for the first problem is twofold. First, I developed and implemented a heuristic for maximum a posteriori (MAP) estimate of the species tree from the sequence alignments of multiple independent loci. Second, based on a study of the accuracy of MSC-based inference methods on data where GDL is at play, I proposed a method for efficient inference of the topology of a species tree in the presence of both ILS and GDL. My contribution for the second problem is twofold as well. I first developed the first three-piece model of phylogenetic network / locus network / gene tree, which accurately captures the three aforementioned processes and yields a generative model of genomic sequence data from a phylogenetic network. I then developed a heuristic for inferring phylogenetic networks from multi-locus data under this generative model. I studied the accuracy of all methods on both simulated and biological data sets. The contributions of my thesis provide further advances in the field of phylogenomics by providing methods that incorporate more of the biological complexity in evolution than existing methods do. Consequently, my methods allow for utilizing more of the genomic data (and signal) for a more accurate inference of not only the species phylogeny, but also the processes that acted upon the individual loci within the genomes of those species.
  • Loading...
    Thumbnail Image
    Item
    Prediction Oriented Marker Selection (PROMISE) for High Dimensional Regression with Application to Personalized Medicine
    (2015-10-27) Kim, Soyeon; Scott, David W.; Lee, J.Jack; Baladandayuthapani, Veerabhadran; Ensor, Katherine B; Nakhleh, Luay K
    In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on a patients individual biomarker profile. Two important goals of biomarker selection are to choose a small number of important biomarkers that are associated with treatment outcomes and to maintain a high-level of prediction accuracy. These goals are challenging because the number of candidate biomarkers can be large compared to the sample size. Established methods for variable selection based on penalized regression methods such as the lasso and the elastic net have yielded promising results. However, selecting the right amount of penalization is critical to maintain the desired properties for both variable selection and prediction accuracy. To select the regularization parameter, cross-validation (CV) is most commonly used. It tends to provide high prediction accuracy as well as a high true positive rate, at the cost of a high false positive rate. Resampling methods such as stability selection (SS) conversely maintains a good control of the false positive rate, but at the cost of yielding too few true positives. We propose prediction oriented marker selection (PROMISE), which combines SS with CV to include the advantages of both methods. We applied PROMISE to (1) the lasso and (2) the elastic net for individual marker selection, (3) the group lasso for pathway selection, and (4) the combination of the group lasso with the lasso for individual marker selection within the selected pathways. Data analysis show that PROMISE produces a more sparse solution than CV, reducing the false positives compared to CV, while giving similar prediction accuracy and true positives. In our simulation and real data analysis, SS does not work well for variable selection and prediction. PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize both type I and type II errors and to maximize prediction accuracy.
  • Loading...
    Thumbnail Image
    Item
    Segmenting Genetic Sequences Based on Common Ancestry
    (2018-04-18) Chen, Lee H; Nakhleh, Luay K
    In this work, I developed an algorithm that segments genetic sequences from multiple species and clusters them into groups of subsequences that are likely to be evolved from a single tree. Traditionally, the evolutionary history of a set of taxa is inferred with the assumption that the majority of genetic changes are passed on from ancestors to descendants. However, this assumption is violated when analyzing the evolution of organisms where genetic materials are often exchanged between unrelated individuals. These exchanges can result in genes with different ancestries combining and giving rise to genomes that do not fit a single tree. To better understand the evolution of these organisms, it is imperative to develop methods that delineate regions of single ancestries in the genomes and cluster subsequences into groups from which trees could be built. As of the time of writing this thesis, methods have already been developed for clustering genetic sequences based on function and for identifying gene fusion events. However, to the best of our knowledge, there are no existing techniques that cluster genetic sequences into groups based exclusively on ancestry. In this work, I designed an algorithm that generates a sequence similarity graph from a user-specified collection of genetic sequences. The algorithm then optimizes the clustering of the graph based on sequence alignment techniques, resulting in groups of sequences that are highly likely to have similar ancestries. This method was tested on simulated sequences and produced clusters of sequences that have high probabilities of evolving from a single tree. With this method, we will be able to more accurately infer the evolution of organisms with frequent gene transfers between unrelated individuals.
  • Loading...
    Thumbnail Image
    Item
    Towards Accurate and Scalable Phylogenetic Inference Under Complex Evolutionary Models Using Neural Networks
    (2023-04-21) Yakici, Berk Alp; Nakhleh, Luay K; Ogilvie, Huw A
    Classical phylogenetic tree inference methods assume sequences evolve under a stationary, reversible, and homogeneous (SRH) model. This assumption is often violated in real data, sometimes severely, depending on the studied biological system. Furthermore, the inference of species trees also needs to account for population-level processes, such as recombination, which poses unique challenges. With a deluge of whole-genome data sets becoming increasingly available, accurately inferring species tree topologies under complex evolutionary models remains an open problem. In this thesis, I introduce a supervised learning (SL) method that uses multi-layer perceptrons (MLPs) for accurately inferring phylogenetic tree topologies from genome-scale data under complex evolutionary processes. We train our model with sequences simulated under the multispecies coalescent model with recombination (MSC-R), and we vary both clock rate and base frequency content across lineages. This enables our model to account for the effects and interactions of multiple complex processes. Utilizing a divide-and-conquer and supertree construction approach, we demonstrate that the inference scales beyond five taxa while remaining accurate. Using a simulation study, we show that our model can outperform classical supermatrix methods, such as neighbor-joining, maximum parsimony, and maximum likelihood, when the SRH assumption of sequence evolution is violated. Additionally, we demonstrate that the amalgamation of quintets is more accurate than that of quartets. Further, we re-analyze a whole-genome alignment of 33 avian species using our MLP model, estimating a species tree that supports the hypothesis of a single origin of non-galliform waterbirds. The accuracy and scalability of our model demonstrate that supervised learning methods can be an important tool for phylogenetic analyses.
  • About R-3
  • Report a Digital Accessibility Issue
  • Request Accessible Formats
  • Fondren Library
  • Contact Us
  • FAQ
  • Privacy Notice
  • R-3 Policies

Physical Address:

6100 Main Street, Houston, Texas 77005

Mailing Address:

MS-44, P.O.BOX 1892, Houston, Texas 77251-1892