Variational Inference Using Approximate Likelihood Under the Coalescent With Recombination

dc.contributor.advisorNakhleh, Luay K.
dc.creatorLiu, Xinhao
dc.date.accessioned2021-05-03T21:42:56Z
dc.date.available2021-05-03T21:42:56Z
dc.date.created2021-05
dc.date.issued2021-04-29
dc.date.submittedMay 2021
dc.date.updated2021-05-03T21:42:56Z
dc.description.abstractCoalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. The multispecies coalescent (MSC) model has been widely employed by phylogenetic algorithms to construct the species tree while accounting for incomplete lineage sorting (ILS). However, the no-recombination assumption of the MSC model has been questioned. To analyze large genomic regions, we need to simultaneously account for both ILS and recombination. A promising avenue for the analysis of large genomic alignments, which are now commonplace, are coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. I introduce in this thesis a novel method, VICAR (Variational Inference under the CoAlescent with Recombination), for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables VICAR to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, I show that VICAR has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies and I report on their accuracy. Furthermore, I illustrate how to scale the method to larger data sets through a divide-and-conquer approach. This accuracy means that my approach is useful now, and by deriving transition rates by simulation it is flexible enough to enable future implementations of all kinds of population models. I have implemented VICAR in the publicly available software package PhyloNet.
dc.format.mimetypeapplication/pdf
dc.identifier.citationLiu, Xinhao. "Variational Inference Using Approximate Likelihood Under the Coalescent With Recombination." (2021) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/110438">https://hdl.handle.net/1911/110438</a>.
dc.identifier.urihttps://hdl.handle.net/1911/110438
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectCoalescent with recombination
dc.subjectrecombination
dc.subjectspecies tree
dc.subjectlocal genealogies
dc.subjecthidden Markov models
dc.subjectvariational inference
dc.titleVariational Inference Using Approximate Likelihood Under the Coalescent With Recombination
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LIU-DOCUMENT-2021.pdf
Size:
2.67 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: