Treangen, Todd J2024-05-222024-052024-04-17May 2024Liu, Yunxi. Finding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasets. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116199https://hdl.handle.net/1911/116199EMBARGO NOTE: This item is embargoed until 2026-05-01The scale and complexity of genomic studies have been expanded alongside the volume of sequencing data thanks to the recent development in next-gen and third-gen sequencing technology. However, errors introduced during sample collection, sample preparation, sequencing, and data analysis through computational methods may distort results and contribute to erroneous interpretations. In this work we present a set of studies that explore anomaly detection and error corrections in genomic data, from different points of view. Broadly the topics of the thesis could be grouped into two categories: those related to metagenomic, whereas the projects focus on accurate profiling of a microbiome community in terms of contamination identification, and false positive detection for taxonomic classification; and those related to viral genomics, whereas the projects focus on variant calling with high error rate long read sequencing, and cryptic mutation detection in wastewater for SARS-CoV-2. These methodologies underscore the significance of rare occurrences in high-throughput sequencing procedures, paving the way for advancements in metagenomics and viral genomics.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.Computational biologyFinding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic DatasetsThesis2024-05-22