Towards more accurate phylogenetic network inference

Date
2023-04-21
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

The multispecies network coalescent (MSNC) extends the multispecies coalescent (MSC) by modeling gene evolution within the branches of a phylogenetic network rather than a phylogenetic tree, which infers speciation events and reticulate evolutionary events by using the model of a phylogenetic network, taking the shape of a rooted, directed, acyclic graph. Existing methods for phylogenetic network inference were developed to account for reticulation and incomplete lineage sorting (ILS) simultaneously. While these methods demonstrate good accuracy on the inference of network topologies and continuous parameters in simple simulation settings, the accuracy can be easily affected in more complex scenarios, such as model violations of gene tree estimation error as well as substitution rates heterogeneity, and reconstructing subnetworks obtained from dividing a full network might be even more difficult than reconstructing the full taxa.

The contributions of this thesis are below. First, I explore the approach to limit the search space of the network by inferring a phylogenetic tree with the addition of horizontal edges. I evaluate this tree-to-network augmentation phase under the minimizing deep coalescence and pseudo-likelihood criteria. I show that a recently developed divide-and-conquer approach significantly outperforms tree-based inference in terms of accuracy, albeit still at a higher computational cost. Second, I study statistical tests for assessing the fitness of gene trees to MSC with realistic gene tree error profiles, and developed a novel approach to determining the model complexity in the presence of gene tree estimation error. Third, I extend a Bayesian inference method MCMC_SEQ to solve the model misspecification caused by rate heterogeneity across loci that lead to spurious reticulations. Also, I study the effects of this model misspecification using simulation and an empirical dataset from Heliconius butterflies, as well as a summary method Infernetwork_ML. Fourth, in the presence of a scalable divide-and-conquer approach, which is promising but still challenging due to the demand for accurate and efficient sub-network inference, I explore inferring complex subnetworks accurately, as dividing a network into subnetworks can increase the difficulty of inference, and I improve the efficiency of MCMC_SEQ.

I implement all the approaches in the publicly available open-source software package PhyloNet.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
phylogenetic network, multispecies network coalescent, tree-based networks, model misspecification, multi-locus phylogeny
Citation

Cao, Zhen. "Towards more accurate phylogenetic network inference." (2023) Diss., Rice University. https://hdl.handle.net/1911/114918.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page