Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data

dc.contributor.advisorNakhleh, Luay
dc.creatorWen, Dingqiao Ellie
dc.date.accessioned2019-05-16T20:02:48Z
dc.date.available2019-05-16T20:02:48Z
dc.date.created2017-12
dc.date.issued2017-10-31
dc.date.submittedDecember 2017
dc.date.updated2019-05-16T20:02:48Z
dc.description.abstractThe multispecies network coalescent (MSNC) is a stochastic process that captures how gene trees grow within the branches of a phylogenetic network. Coupling the MSNC with a stochastic mutational process that operates along the branches of the gene trees gives rise to a generative model of how multiple loci from within and across species evolve in the presence of both incomplete lineage sorting (ILS) and reticulation (e.g., hybridization). We report on a Bayesian method for sampling the parameters of this generative model, including the species phylogeny, gene trees, divergence times, and population sizes, from DNA sequences of multiple independent loci. We demonstrate the utility of our method by analyzing simulated data and reanalyzing three biological data sets. Our results demonstrate the significance of not only co-estimating species phylogenies and gene trees, but also accounting for reticulation and ILS simultaneously. In particular, we show that when gene flow occurs, our method accurately estimates the evolutionary histories, coalescence times, and divergence times. Tree inference methods, on the other hand, underestimate divergence times and overestimate coalescence times when the evolutionary history is reticulate. While the MSNC corresponds to an abstract model of “intermixture,” we study the performance of the model and method on simulated data generated under a gene flow model. We show that the method accurately infers the most recent time at which gene flow occurs. For genotype data, our method adopts a phasing procedure that integrates over all possible phasing of diploid genotypes, providing accurate estimates of divergence times and parameters. In contrast, the common practice random phasing would result in failure detection of intermixture events, inaccurate divergence times and population sizes, especially at low time scales, as demonstrate by our simulation results.
dc.format.mimetypeapplication/pdf
dc.identifier.citationWen, Dingqiao Ellie. "Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data." (2017) Diss., Rice University. <a href="https://hdl.handle.net/1911/105468">https://hdl.handle.net/1911/105468</a>.
dc.identifier.urihttps://hdl.handle.net/1911/105468
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectphylogenetic network
dc.subjectBayesian inference
dc.subjectRJMCMC
dc.subjectMultispecies network coalescent
dc.subjectreticulation
dc.subjectincomplete lineage sorting
dc.titleCo-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WEN-DOCUMENT-2017.pdf
Size:
13.02 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: