Browsing by Author "Yan, Zhi"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Current progress and open challenges for applying deep learning across the biosciences(Springer Nature, 2022) Sapoval, Nicolae; Aghazadeh, Amirali; Nute, Michael G.; Antunes, Dinler A.; Balaji, Advait; Baraniuk, Richard; Barberan, C.J.; Dannenfelser, Ruth; Dun, Chen; Edrisi, Mohammadamin; Elworth, R.A. Leo; Kille, Bryce; Kyrillidis, Anastasios; Nakhleh, Luay; Wolfe, Cameron R.; Yan, Zhi; Yao, Vicky; Treangen, Todd J.; Bioengineering; Computer ScienceDeep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.Item Evaluation of Existing Methods and Development of New Ones for Phylogenomic Analyses(2024-04-17) Yan, Zhi; Nakhleh, LuayDespite the revolution brought by phylogenomics, accurately reconstructing the Tree of Life remains a challenge due to discordance between gene and species histories. These incongruences arise from biological processes like incomplete lineage sorting (ILS). The multispecies coalescent (MSC) model, a cornerstone for species tree inference, accounts for ILS but assumes strict orthology, no recombination within loci, and free recombination between loci. The multispecies network coalescent (MSNC) extends the MSC to accommodate diploid hybridization, but real biological complexities often require further refinement. This thesis addresses these limitations by investigating the impact of MSC assumption violations on phylogenetic inference. We explore three key areas: (1) the potential of utilizing paralogs for species tree reconstruction, (2) the influence of recombination on population parameter estimation, and (3) the effectiveness of existing gene tree correction methods. We then introduce two novel methods, MPAllopp and Polyphest, specifically designed to infer phylogenetic networks that account for both ILS and polyploidy, a prevalent phenomenon in evolution. These methods are validated through extensive simulations and real data analyses. Overall, this thesis contributes to enhancing the accuracy of phylogenetic inference by critically evaluating existing methods and developing novel approaches that can handle the complexities of real-world data.Item Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes(Oxford University Press, 2022) Yan, Zhi; Cao, Zhen; Liu, Yushu; Ogilvie, Huw A.; Nakhleh, LuayPhylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this article, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene tree topologies as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating evolutionary hypotheses in the form of phylogenetic networks are implemented and publicly available in the PhyloNet software package. [Incomplete lineage sorting; minimizing deep coalescences; multilabeled trees; multispecies network coalescent; phylogenetic networks; polyploidy.]Item Polyphest: fast polyploid phylogeny estimation(Oxford University Press, 2024) Yan, Zhi; Cao, Zhen; Nakhleh, LuayDespite the widespread occurrence of polyploids across the Tree of Life, especially in the plant kingdom, very few computational methods have been developed to handle the specific complexities introduced by polyploids in phylogeny estimation. Furthermore, methods that are designed to account for polyploidy often disregard incomplete lineage sorting (ILS), a major source of heterogeneous gene histories, or are computationally very demanding. Therefore, there is a great need for efficient and robust methods to accurately reconstruct polyploid phylogenies.We introduce Polyphest (POLYploid PHylogeny ESTimation), a new method for efficiently and accurately inferring species phylogenies in the presence of both polyploidy and ILS. Polyphest bypasses the need for extensive network space searches by first generating a multilabeled tree based on gene trees, which is then converted into a (uniquely labeled) species phylogeny. We compare the performance of Polyphest to that of two polyploid phylogeny estimation methods, one of which does not account for ILS, namely PADRE, and another that accounts for ILS, namely MPAllopp. Polyphest is more accurate than PADRE and achieves comparable accuracy to MPAllopp, while being significantly faster. We also demonstrate the application of Polyphest to empirical data from the hexaploid bread wheat and confirm the allopolyploid origin of bread wheat along with the closest relatives for each of its subgenomes.Polyphest is available at https://github.com/NakhlehLab/Polyphest.