Variational Inference Using Approximate Likelihood Under the Coalescent With Recombination

dc.contributor.advisorNakhleh, Luay K.en_US
dc.creatorLiu, Xinhaoen_US
dc.date.accessioned2021-05-03T21:42:56Zen_US
dc.date.available2021-05-03T21:42:56Zen_US
dc.date.created2021-05en_US
dc.date.issued2021-04-29en_US
dc.date.submittedMay 2021en_US
dc.date.updated2021-05-03T21:42:56Zen_US
dc.description.abstractCoalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. The multispecies coalescent (MSC) model has been widely employed by phylogenetic algorithms to construct the species tree while accounting for incomplete lineage sorting (ILS). However, the no-recombination assumption of the MSC model has been questioned. To analyze large genomic regions, we need to simultaneously account for both ILS and recombination. A promising avenue for the analysis of large genomic alignments, which are now commonplace, are coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. I introduce in this thesis a novel method, VICAR (Variational Inference under the CoAlescent with Recombination), for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables VICAR to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, I show that VICAR has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies and I report on their accuracy. Furthermore, I illustrate how to scale the method to larger data sets through a divide-and-conquer approach. This accuracy means that my approach is useful now, and by deriving transition rates by simulation it is flexible enough to enable future implementations of all kinds of population models. I have implemented VICAR in the publicly available software package PhyloNet.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLiu, Xinhao. "Variational Inference Using Approximate Likelihood Under the Coalescent With Recombination." (2021) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/110438">https://hdl.handle.net/1911/110438</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/110438en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectCoalescent with recombinationen_US
dc.subjectrecombinationen_US
dc.subjectspecies treeen_US
dc.subjectlocal genealogiesen_US
dc.subjecthidden Markov modelsen_US
dc.subjectvariational inferenceen_US
dc.titleVariational Inference Using Approximate Likelihood Under the Coalescent With Recombinationen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LIU-DOCUMENT-2021.pdf
Size:
2.67 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: