Finding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasets
Abstract
The scale and complexity of genomic studies have been expanded alongside the volume of sequencing data thanks to the recent development in next-gen and third-gen sequencing technology. However, errors introduced during sample collection, sample preparation, sequencing, and data analysis through computational methods may distort results and contribute to erroneous interpretations. In this work we present a set of studies that explore anomaly detection and error corrections in genomic data, from different points of view. Broadly the topics of the thesis could be grouped into two categories: those related to metagenomic, whereas the projects focus on accurate profiling of a microbiome community in terms of contamination identification, and false positive detection for taxonomic classification; and those related to viral genomics, whereas the projects focus on variant calling with high error rate long read sequencing, and cryptic mutation detection in wastewater for SARS-CoV-2. These methodologies underscore the significance of rare occurrences in high-throughput sequencing procedures, paving the way for advancements in metagenomics and viral genomics.
Description
Advisor
Degree
Type
Keywords
Citation
Liu, Yunxi. Finding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasets. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116199