Scalable Methods for Phylogenetic Network Inference

dc.contributor.advisorNakhleh, Luayen_US
dc.creatorZhu, Jiafanen_US
dc.date.accessioned2019-05-17T18:50:15Zen_US
dc.date.available2019-05-17T18:50:15Zen_US
dc.date.created2019-05en_US
dc.date.issued2019-04-10en_US
dc.date.submittedMay 2019en_US
dc.date.updated2019-05-17T18:50:15Zen_US
dc.description.abstractReticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks, which take the shape of rooted, directed, acyclic graphs. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other evolutionary processes, such as incomplete lineage sorting (ILS). These methods use two different types of input: unlinked bi-allelic markers (e.g., single nucleotide polymorphism data), and sequence alignments of multiple, unlinked loci. While these methods have good accuracy in terms of estimating the network and its parameters, likelihood computations and convergence remain major computational bottlenecks and limit the methods’ applicability and scalability. The contributions of this thesis are threefold. First, I explore the challenge with viewing a phylogenetic network as an underlying phylogenetic tree with an additional set of “horizontal” edges. Furthermore, I demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. Second, I develop an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. I demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data, and I demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Third, I introduce a novel divide-and-conquer method for scalable inference of phylogenetic networks from the sequence data of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of subproblems on which to infer subnetworks, a Hitting Set version of the problem of finding a small number of subsets is formulated, and a simple heuristic is implemented to solve it. I demonstrate the performance of the two-step algorithm, in terms of both running time and accuracy, on simulated as well as on biological data sets. The divide-and-conquer method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. I implemented and made availably to the community all the algorithms in the publicly available software package PhyloNet. The contributions of my thesis provide a significant and promising step towards accurate, large-scale phylogenetic network inference.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationZhu, Jiafan. "Scalable Methods for Phylogenetic Network Inference." (2019) Diss., Rice University. <a href="https://hdl.handle.net/1911/105957">https://hdl.handle.net/1911/105957</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/105957en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectPhylogenetic networksen_US
dc.subjectScalabilityen_US
dc.subjectDivide-and-conqueren_US
dc.subjectBayesian inferenceen_US
dc.subjectmulti-locus phylogenomicsen_US
dc.titleScalable Methods for Phylogenetic Network Inferenceen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZHU-DOCUMENT-2019.pdf
Size:
2.27 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: