VariPhyer: A Modular Computational Platform for Verifying Microbial Variant Calling and Phylogenomic Analyses

dc.contributor.advisorTreangen, Todd
dc.creatorLiao, Chunxiao
dc.date.accessioned2022-09-23T21:46:24Z
dc.date.available2022-11-01T05:01:10Z
dc.date.created2022-05
dc.date.issued2022-04-22
dc.date.submittedMay 2022
dc.date.updated2022-09-23T21:46:24Z
dc.description.abstractThe COVID-19 pandemic has forever highlighted that the inference of whole-genome phylogeny, or phylogenomics, is critical for studying the evolution and transmission of infectious diseases. Furthermore, in phylogenomic analyses, deciding on which workflow to use, and what results to trust, is a critical open research question. Reproducible, explainable, and accurate microbial genomics analysis pipelines with comprehensive benchmarking of a known ground truth are an urgent need. Here, we propose a benchmarking pipeline, VariPhyer, an end-to-end, comprehensive framework for microbial benchmarking of phylogenetic inference and variant calling from short reads, long reads, and assembled genomes, all with best practices. VariPhyer was implemented in Nextflow and uses simulated genomic variants and evolutionary relationships as ground truth. The main idea behind VariPhyer is to provide a proving ground and evaluative framework for phylogenomics based on genome alignment and variant calling; any given approach should be close to the simulated ground truth if there is no error in the pipeline. VariPhyer simulates variants in the given genome given a phylogenetic tree, then uses the selected pipeline and evaluation matrix to compare the difference in tree comparison and variant calling accuracy. To test the correctness of our implementation and our hypothesis, we designed and experimented with simulated phylogenies and variants in a bacterial genome backbone. We evaluated the output phylogenies by comparing the tree differences with the designed tree as the loss function. Our hypothesis has been tested by the consistency between the designed and the output phylogeny across most pipelines in VariPhyer. VariPhyer provides loss functions to evaluate every tool statistically, regardless of species or sequence platform. Trees with different topology, branch lengths, and taxa numbers have been tested with the pipeline. The results indicate our pipeline is accurate and efficient for phylogenomic analysis and evaluation. In summary, VariPhyer is an open-source ``push-button'' phylogenomic processing and evaluation pipeline, representing a first step towards verified infectious disease analysis.
dc.embargo.terms2022-11-01
dc.format.mimetypeapplication/pdf
dc.identifier.citationLiao, Chunxiao. "VariPhyer: A Modular Computational Platform for Verifying Microbial Variant Calling and Phylogenomic Analyses." (2022) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/113344">https://hdl.handle.net/1911/113344</a>.
dc.identifier.urihttps://hdl.handle.net/1911/113344
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectpipeline
dc.subjectphylogenetic tree
dc.titleVariPhyer: A Modular Computational Platform for Verifying Microbial Variant Calling and Phylogenomic Analyses
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LIAO-DOCUMENT-2022.pdf
Size:
2.34 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: