Scalable Methods for Phylogenetic Network Inference

dc.contributor.advisorNakhleh, Luay
dc.creatorZhu, Jiafan
dc.date.accessioned2019-05-17T18:50:15Z
dc.date.available2019-05-17T18:50:15Z
dc.date.created2019-05
dc.date.issued2019-04-10
dc.date.submittedMay 2019
dc.date.updated2019-05-17T18:50:15Z
dc.description.abstractReticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks, which take the shape of rooted, directed, acyclic graphs. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other evolutionary processes, such as incomplete lineage sorting (ILS). These methods use two different types of input: unlinked bi-allelic markers (e.g., single nucleotide polymorphism data), and sequence alignments of multiple, unlinked loci. While these methods have good accuracy in terms of estimating the network and its parameters, likelihood computations and convergence remain major computational bottlenecks and limit the methods’ applicability and scalability. The contributions of this thesis are threefold. First, I explore the challenge with viewing a phylogenetic network as an underlying phylogenetic tree with an additional set of “horizontal” edges. Furthermore, I demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. Second, I develop an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. I demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data, and I demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Third, I introduce a novel divide-and-conquer method for scalable inference of phylogenetic networks from the sequence data of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of subproblems on which to infer subnetworks, a Hitting Set version of the problem of finding a small number of subsets is formulated, and a simple heuristic is implemented to solve it. I demonstrate the performance of the two-step algorithm, in terms of both running time and accuracy, on simulated as well as on biological data sets. The divide-and-conquer method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. I implemented and made availably to the community all the algorithms in the publicly available software package PhyloNet. The contributions of my thesis provide a significant and promising step towards accurate, large-scale phylogenetic network inference.
dc.format.mimetypeapplication/pdf
dc.identifier.citationZhu, Jiafan. "Scalable Methods for Phylogenetic Network Inference." (2019) Diss., Rice University. <a href="https://hdl.handle.net/1911/105957">https://hdl.handle.net/1911/105957</a>.
dc.identifier.urihttps://hdl.handle.net/1911/105957
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectPhylogenetic networks
dc.subjectScalability
dc.subjectDivide-and-conquer
dc.subjectBayesian inference
dc.subjectmulti-locus phylogenomics
dc.titleScalable Methods for Phylogenetic Network Inference
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZHU-DOCUMENT-2019.pdf
Size:
2.27 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: