Browsing by Author "Du, Peng"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Gene Duplicability-Connectivity-Complexity across Organisms and a Neutral Evolutionary Explanation(Public Library of Science, 2012) Zhu, Yun; Du, Peng; Nakhleh, LuayGene duplication has long been acknowledged by biologists as a major evolutionary force shaping genomic architectures and characteristics across the Tree of Life. Major research has been conducting on elucidating the fate of duplicated genes in a variety of organisms, as well as factors that affect a geneメs duplicabilityヨthat is, the tendency of certain genes to retain more duplicates than others. In particular, two studies have looked at the correlation between gene duplicability and its degree in a protein-protein interaction network in yeast, mouse, and human, and another has looked at the correlation between gene duplicability and its complexity (length, number of domains, etc.) in yeast. In this paper, we extend these studies to six species, and two trends emerge. There is an increase in the duplicability-connectivity correlation that agrees with the increase in the genome size as well as the phylogenetic relationship of the species. Further, the duplicabilitycomplexity correlation seems to be constant across the species. We argue that the observed correlations can be explained by neutral evolutionary forces acting on the genomic regions containing the genes. For the duplicability-connectivity correlation, we show through simulations that an increasing trend can be obtained by adjusting parameters to approximate genomic characteristics of the respective species. Our results call for more research into factors, adaptive and non-adaptive alike, that determine a geneメs duplicability.Item Phylogeny Inference in the Presence of Incomplete Lineage Sorting, Gene Duplication and Loss and Hybridization(2019-04-10) Du, Peng; Nakhleh, Luay KA species phylogeny captures how a set of extant species split and diverged from their most recent common ancestral species. A gene tree captures the evolutionary history of an individual gene or, more generally, non-recombining genomic region. A very complex relationship exists between the phylogeny of a set of species and the trees of genes in the genomes of those species. The complexity arises because of processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and hybridization, all of which can give rise to gene trees whose topologies disagree with each other as well as with that of the species phylogeny. Species phylogeny inference in the post-genomic era, also known as phylogenomic inference, requires developing models and methods that account for these processes in order to relate how individual loci (genomic regions) evolve within and across the branches of species phylogenies. For example, the multispecies coalescent (MSC) has been introduced to model ILS, and statistical species tree inference methods based on it have been developed. This model was later extended to allow for reticulation events (e.g., hybridization), and statistical methods for inferring phylogenetic networks were developed. Birth-death models of gene evolution have also been introduced to capture gene duplications and losses, and species tree inference methods that utilize them have been developed. In this thesis, I address two computational problems that arise in this domain. The first problem concerns the inference of species trees from multiple loci assuming that only ILS and GDL are at play, but not reticulation. The second problem concerns the inference of species (phylogenetic) networks from multiple loci when all three processes ILS, GDL, and reticulation are at play. My contribution for the first problem is twofold. First, I developed and implemented a heuristic for maximum a posteriori (MAP) estimate of the species tree from the sequence alignments of multiple independent loci. Second, based on a study of the accuracy of MSC-based inference methods on data where GDL is at play, I proposed a method for efficient inference of the topology of a species tree in the presence of both ILS and GDL. My contribution for the second problem is twofold as well. I first developed the first three-piece model of phylogenetic network / locus network / gene tree, which accurately captures the three aforementioned processes and yields a generative model of genomic sequence data from a phylogenetic network. I then developed a heuristic for inferring phylogenetic networks from multi-locus data under this generative model. I studied the accuracy of all methods on both simulated and biological data sets. The contributions of my thesis provide further advances in the field of phylogenomics by providing methods that incorporate more of the biological complexity in evolution than existing methods do. Consequently, my methods allow for utilizing more of the genomic data (and signal) for a more accurate inference of not only the species phylogeny, but also the processes that acted upon the individual loci within the genomes of those species.