Browsing by Author "Yu, Yun"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item A maximum pseudo-likelihood approach for phylogenetic networks(BioMed Central, 2015) Yu, Yun; Nakhleh, Luay K.Abstract Background Several phylogenomic analyses have recently demonstrated the need to account simultaneously for incomplete lineage sorting (ILS) and hybridization when inferring a species phylogeny. A maximum likelihood approach was introduced recently for inferring species phylogenies in the presence of both processes, and showed very good results. However, computing the likelihood of a model in this case is computationally infeasible except for very small data sets. Results Inspired by recent work on the pseudo-likelihood of species trees based on rooted triples, we introduce the pseudo-likelihood of a phylogenetic network, which, when combined with a search heuristic, provides a statistical method for phylogenetic network inference in the presence of ILS. Unlike trees, networks are not always uniquely encoded by a set of rooted triples. Therefore, even when given sufficient data, the method might converge to a network that is equivalent under rooted triples to the true one, but not the true one itself. The method is computationally efficient and has produced very good results on the data sets we analyzed. The method is implemented in PhyloNet, which is publicly available in open source. Conclusions Maximum pseudo-likelihood allows for inferring species phylogenies in the presence of hybridization and ILS, while scaling to much larger data sets than is currently feasible under full maximum likelihood. The nonuniqueness of phylogenetic networks encoded by a system of rooted triples notwithstanding, the proposed method infers the correct network under certain scenarios, and provides candidates for further exploration under other criteria and/or data in other scenarios.Item Bayesian inference of phylogenetic networks from bi-allelic genetic markers(Public Library of Science, 2018) Zhu, Jiafan; Wen, Dingqiao; Yu, Yun; Meudt, Heidi M.; Nakhleh, Luay K.Phylogenetic networks are rooted, directed, acyclic graphs that model reticulate evolutionary histories. Recently, statistical methods were devised for inferring such networks from either gene tree estimates or the sequence alignments of multiple unlinked loci. Bi-allelic markers, most notably single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs), provide a powerful source of genome-wide data. In a recent paper, a method called SNAPP was introduced for statistical inference of species trees from unlinked bi-allelic markers. The generative process assumed by the method combined both a model of evolution for the bi-allelic markers, as well as the multispecies coalescent. A novel component of the method was a polynomial-time algorithm for exact computation of the likelihood of a fixed species tree via integration over all possible gene trees for a given marker. Here we report on a method for Bayesian inference of phylogenetic networks from bi-allelic markers. Our method significantly extends the algorithm for exact computation of phylogenetic network likelihood via integration over all possible gene trees. Unlike the case of species trees, the algorithm is no longer polynomial-time on all instances of phylogenetic networks. Furthermore, the method utilizes a reversible-jump MCMC technique to sample the posterior of phylogenetic networks given bi-allelic marker data. Our method has a very good performance in terms of accuracy and robustness as we demonstrate on simulated data, as well as a data set of multiple New Zealand species of the plant genus Ourisia (Plantaginaceae). We implemented the method in the publicly available, open-source PhyloNet software package.Item Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent(Public Library of Science, 2016) Wen, Dingqiao; Yu, Yun; Nakhleh, Luay K.The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation.Item Exploring phylogenetic hypotheses via Gibbs sampling on evolutionary networks(BioMed Central, 2016) Yu, Yun; Jermaine, Christopher; Nakhleh, Luay K.Abstract Background Phylogenetic networks are leaf-labeled graphs used to model and display complex evolutionary relationships that do not fit a single tree. There are two classes of phylogenetic networks: Data-display networks and evolutionary networks. While data-display networks are very commonly used to explore data, they are not amenable to incorporating probabilistic models of gene and genome evolution. Evolutionary networks, on the other hand, can accommodate such probabilistic models, but they are not commonly used for exploration. Results In this work, we show how to turn evolutionary networks into a tool for statistical exploration of phylogenetic hypotheses via a novel application of Gibbs sampling. We demonstrate the utility of our work on two recently available genomic data sets, one from a group of mosquitos and the other from a group of modern birds. We demonstrate that our method allows the use of evolutionary networks not only for explicit modeling of reticulate evolutionary histories, but also for exploring conflicting treelike hypotheses. We further demonstrate the performance of the method on simulated data sets, where the true evolutionary histories are known. Conclusion We introduce an approach to explore phylogenetic hypotheses over evolutionary phylogenetic networks using Gibbs sampling. The hypotheses could involve reticulate and non-reticulate evolutionary processes simultaneously as we illustrate on mosquito and modern bird genomic data sets.Item From Gene Trees to Species Trees: Algorithms for Parsimonious Reconciliation(2012) Yu, Yun; Nakhleh, Luay K.One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is minimize deep coalescence , or MDC. Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene trees may differ from true gene trees, be incompletely resolved, and not necessarily rooted. Further, the MDC criterion considers only the topologies of the gene trees. So the contributions of my work are three-fold: 1. We propose new MDC formulations for the cases in which the gene trees are unrooted/binary, rooted/non-binary, and unrooted/non-binary, prove structural theorems that allow me to extend the algorithms for the rooted/binary gene tree case to these cases in a straightforward manner. 2. We propose an algorithm for inferring a species tree from a collection of gene trees with coalescence times that takes into account not only the topology of the gene trees but also the coalescence times. 3. We devise MDC-based algorithms for cases in which multiple alleles per species may be sampled. We have implemented all of the algorithms in the PhyloNet software package and studied their performance in coalescent-based simulation studies in comparison with other methods including democratic vote, greedy consensus, STEM, and GLASS.Item In the light of deep coalescence: revisiting trees within networks(BioMed Central, 2016) Zhu, Jiafan; Yu, Yun; Nakhleh, Luay K.Abstract Background Phylogenetic networks model reticulate evolutionary histories. The last two decades have seen an increased interest in establishing mathematical results and developing computational methods for inferring and analyzing these networks. A salient concept underlying a great majority of these developments has been the notion that a network displays a set of trees and those trees can be used to infer, analyze, and study the network. Results In this paper, we show that in the presence of coalescence effects, the set of displayed trees is not sufficient to capture the network. We formally define the set of parental trees of a network and make three contributions based on this definition. First, we extend the notion of anomaly zone to phylogenetic networks and report on anomaly results for different networks. Second, we demonstrate how coalescence events could negatively affect the ability to infer a species tree that could be augmented into the correct network. Third, we demonstrate how a phylogenetic network can be viewed as a mixture model that lends itself to a novel inference approach via gene tree clustering. Conclusions Our results demonstrate the limitations of focusing on the set of trees displayed by a network when analyzing and inferring the network. Our findings can form the basis for achieving higher accuracy when inferring phylogenetic networks and open up new venues for research in this area, including new problem formulations based on the notion of a network’s parental trees.Item Models and Methods for Evolutionary Histories Involving Hybridization and Incomplete Lineage Sorting(2014-04-09) Yu, Yun; Nakhleh, Luay K.; Jermaine, Christopher M.; Kohn, Michael H.; Kavraki, Lydia E.Hybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detecting hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and exploit- ing their differences as signal of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. Methods that account for both hybridization and ILS currently exist for only very limited cases. The contributions of my work are two-fold: • I devised the first parsimony criterion for the inference of phylogenetic networks (topologies alone) in the presence of ILS, along with new algorithms for the inference. • I devised the first likelihood criterion for the inference of phylogenetic networks (topologies, branch lengths, and inheritance probabilities) in the presence of ILS, along with new algorithms for the inference. I have implemented all the algorithms in our open-source, publicly available PhyloNet software package, and studied their performance in extensive simulation studies. Both the parsimony and likelihood approaches show very good performance in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, the parsimony approach shows good performance in terms of efficiency on handling large data sets in the experiments. Further, I analyzed two biological data sets (a data sets of yeast genomes and another of house mouse genomes) and found support for hybridization in both. My work will allow, for the first time, systematic phylogenomic analyses of data sets where hybridization is suspected. Thus, biologists will be able now to revisit existing analyses and conduct new ones with richer evolutionary models and inference methods. Further, the computational techniques presented here can be extended to other reticulate evolutionary events, such as horizontal gene transfer, which are believed to be ubiquitous in bacteria.Item Parsimonious Inference of Hybridization in the Presence of Incomplete Lineage Sorting(Oxford University Press, on behalf of the Society of Systematic Biologists, 2013) Yu, Yun; Barnett, R. Matthew; Nakhleh, LuayHybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analyzing a yeast data set, we demonstrate issues that arise when analyzing real data sets. While a probabilistic approach was recently introduced for this problem, and while parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS.Item Reticulate evolutionary history and extensive introgression in mosquito species revealed by phylogenetic network analysis(Wiley, 2016) Wen, Dingqiao; Yu, Yun; Hahn, Matthew W.; Nakhleh, Luay K.The role of hybridization and subsequent introgression has been demonstrated in an increasing number of species. Recently, Fontaine et al. (Science, 347, 2015, 1258524) conducted a phylogenomic analysis of six members of the Anopheles gambiae species complex. Their analysis revealed a reticulate evolutionary history and pointed to extensive introgression on all four autosomal arms. The study further highlighted the complex evolutionary signals that the co-occurrence of incomplete lineage sorting (ILS) and introgression can give rise to in phylogenomic analyses. While tree-based methodologies were used in the study, phylogenetic networks provide a more natural model to capture reticulate evolutionary histories. In this work, we reanalyse the Anopheles data using a recently devised framework that combines the multispecies coalescent with phylogenetic networks. This framework allows us to capture ILS and introgression simultaneously, and forms the basis for statistical methods for inferring reticulate evolutionary histories. The new analysis reveals a phylogenetic network with multiple hybridization events, some of which differ from those reported in the original study. To elucidate the extent and patterns of introgression across the genome, we devise a new method that quantifies the use of reticulation branches in the phylogenetic network by each genomic region. Applying the method to the mosquito data set reveals the evolutionary history of all the chromosomes. This study highlights the utility of ‘network thinking’ and the new insights it can uncover, in particular in phylogenomic analyses of large data sets with extensive gene tree incongruence.Item The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection(Public Library of Science, 2012) Yu, Yun; Degnan, James H.; Nakhleh, Luay