Towards more accurate phylogenetic network inference

dc.contributor.advisorNakhleh, Luayen_US
dc.contributor.advisorOgilvie, Huw Alexanderen_US
dc.creatorCao, Zhenen_US
dc.date.accessioned2023-06-15T21:24:33Zen_US
dc.date.available2023-06-15T21:24:33Zen_US
dc.date.created2023-05en_US
dc.date.issued2023-04-21en_US
dc.date.submittedMay 2023en_US
dc.date.updated2023-06-15T21:24:33Zen_US
dc.description.abstractThe multispecies network coalescent (MSNC) extends the multispecies coalescent (MSC) by modeling gene evolution within the branches of a phylogenetic network rather than a phylogenetic tree, which infers speciation events and reticulate evolutionary events by using the model of a phylogenetic network, taking the shape of a rooted, directed, acyclic graph. Existing methods for phylogenetic network inference were developed to account for reticulation and incomplete lineage sorting (ILS) simultaneously. While these methods demonstrate good accuracy on the inference of network topologies and continuous parameters in simple simulation settings, the accuracy can be easily affected in more complex scenarios, such as model violations of gene tree estimation error as well as substitution rates heterogeneity, and reconstructing subnetworks obtained from dividing a full network might be even more difficult than reconstructing the full taxa. The contributions of this thesis are below. First, I explore the approach to limit the search space of the network by inferring a phylogenetic tree with the addition of horizontal edges. I evaluate this tree-to-network augmentation phase under the minimizing deep coalescence and pseudo-likelihood criteria. I show that a recently developed divide-and-conquer approach significantly outperforms tree-based inference in terms of accuracy, albeit still at a higher computational cost. Second, I study statistical tests for assessing the fitness of gene trees to MSC with realistic gene tree error profiles, and developed a novel approach to determining the model complexity in the presence of gene tree estimation error. Third, I extend a Bayesian inference method MCMC_SEQ to solve the model misspecification caused by rate heterogeneity across loci that lead to spurious reticulations. Also, I study the effects of this model misspecification using simulation and an empirical dataset from Heliconius butterflies, as well as a summary method Infernetwork_ML. Fourth, in the presence of a scalable divide-and-conquer approach, which is promising but still challenging due to the demand for accurate and efficient sub-network inference, I explore inferring complex subnetworks accurately, as dividing a network into subnetworks can increase the difficulty of inference, and I improve the efficiency of MCMC_SEQ. I implement all the approaches in the publicly available open-source software package PhyloNet.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationCao, Zhen. "Towards more accurate phylogenetic network inference." (2023) Diss., Rice University. <a href="https://hdl.handle.net/1911/114918">https://hdl.handle.net/1911/114918</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/114918en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectphylogenetic networken_US
dc.subjectmultispecies network coalescenten_US
dc.subjecttree-based networksen_US
dc.subjectmodel misspecificationen_US
dc.subjectmulti-locus phylogenyen_US
dc.titleTowards more accurate phylogenetic network inferenceen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CAO-DOCUMENT-2023.pdf
Size:
10.73 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: