Towards Accurate and Scalable Phylogenetic Inference Under Complex Evolutionary Models Using Neural Networks
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Classical phylogenetic tree inference methods assume sequences evolve under a stationary, reversible, and homogeneous (SRH) model. This assumption is often violated in real data, sometimes severely, depending on the studied biological system. Furthermore, the inference of species trees also needs to account for population-level processes, such as recombination, which poses unique challenges. With a deluge of whole-genome data sets becoming increasingly available, accurately inferring species tree topologies under complex evolutionary models remains an open problem.
In this thesis, I introduce a supervised learning (SL) method that uses multi-layer perceptrons (MLPs) for accurately inferring phylogenetic tree topologies from genome-scale data under complex evolutionary processes. We train our model with sequences simulated under the multispecies coalescent model with recombination (MSC-R), and we vary both clock rate and base frequency content across lineages. This enables our model to account for the effects and interactions of multiple complex processes. Utilizing a divide-and-conquer and supertree construction approach, we demonstrate that the inference scales beyond five taxa while remaining accurate. Using a simulation study, we show that our model can outperform classical supermatrix methods, such as neighbor-joining, maximum parsimony, and maximum likelihood, when the SRH assumption of sequence evolution is violated. Additionally, we demonstrate that the amalgamation of quintets is more accurate than that of quartets. Further, we re-analyze a whole-genome alignment of 33 avian species using our MLP model, estimating a species tree that supports the hypothesis of a single origin of non-galliform waterbirds. The accuracy and scalability of our model demonstrate that supervised learning methods can be an important tool for phylogenetic analyses.
Description
Advisor
Degree
Type
Keywords
Citation
Yakici, Berk Alp. "Towards Accurate and Scalable Phylogenetic Inference Under Complex Evolutionary Models Using Neural Networks." (2023) Master’s Thesis, Rice University. https://hdl.handle.net/1911/115163.