Towards Accurate and Scalable Phylogenetic Inference Under Complex Evolutionary Models Using Neural Networks

dc.contributor.advisorNakhleh, Luay Ken_US
dc.contributor.advisorOgilvie, Huw Aen_US
dc.creatorYakici, Berk Alpen_US
dc.date.accessioned2023-08-09T18:58:58Zen_US
dc.date.created2023-05en_US
dc.date.issued2023-04-21en_US
dc.date.submittedMay 2023en_US
dc.date.updated2023-08-09T18:58:59Zen_US
dc.description.abstractClassical phylogenetic tree inference methods assume sequences evolve under a stationary, reversible, and homogeneous (SRH) model. This assumption is often violated in real data, sometimes severely, depending on the studied biological system. Furthermore, the inference of species trees also needs to account for population-level processes, such as recombination, which poses unique challenges. With a deluge of whole-genome data sets becoming increasingly available, accurately inferring species tree topologies under complex evolutionary models remains an open problem. In this thesis, I introduce a supervised learning (SL) method that uses multi-layer perceptrons (MLPs) for accurately inferring phylogenetic tree topologies from genome-scale data under complex evolutionary processes. We train our model with sequences simulated under the multispecies coalescent model with recombination (MSC-R), and we vary both clock rate and base frequency content across lineages. This enables our model to account for the effects and interactions of multiple complex processes. Utilizing a divide-and-conquer and supertree construction approach, we demonstrate that the inference scales beyond five taxa while remaining accurate. Using a simulation study, we show that our model can outperform classical supermatrix methods, such as neighbor-joining, maximum parsimony, and maximum likelihood, when the SRH assumption of sequence evolution is violated. Additionally, we demonstrate that the amalgamation of quintets is more accurate than that of quartets. Further, we re-analyze a whole-genome alignment of 33 avian species using our MLP model, estimating a species tree that supports the hypothesis of a single origin of non-galliform waterbirds. The accuracy and scalability of our model demonstrate that supervised learning methods can be an important tool for phylogenetic analyses.en_US
dc.embargo.lift2023-11-01en_US
dc.embargo.terms2023-11-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationYakici, Berk Alp. "Towards Accurate and Scalable Phylogenetic Inference Under Complex Evolutionary Models Using Neural Networks." (2023) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/115163">https://hdl.handle.net/1911/115163</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/115163en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectspecies treesen_US
dc.subjectsupertreesen_US
dc.subjectphylogenetic inferenceen_US
dc.subjectlikelihood-free inferenceen_US
dc.subjectneural networksen_US
dc.subjectmultilayer perceptronen_US
dc.subjectdivide-and-conqueren_US
dc.subjectmultispecies coalescenten_US
dc.subjectrecombinationen_US
dc.subjectincomplete lineage sortingen_US
dc.subjectnon-stationarityen_US
dc.subjectcompositional biasen_US
dc.titleTowards Accurate and Scalable Phylogenetic Inference Under Complex Evolutionary Models Using Neural Networksen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
YAKICI-DOCUMENT-2023.pdf
Size:
1.79 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: