Finding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasets

dc.contributor.advisorTreangen, Todd Jen_US
dc.creatorLiu, Yunxien_US
dc.date.accessioned2024-05-22T16:30:02Zen_US
dc.date.created2024-05en_US
dc.date.issued2024-04-17en_US
dc.date.submittedMay 2024en_US
dc.date.updated2024-05-22T16:30:02Zen_US
dc.descriptionEMBARGO NOTE: This item is embargoed until 2026-05-01en_US
dc.description.abstractThe scale and complexity of genomic studies have been expanded alongside the volume of sequencing data thanks to the recent development in next-gen and third-gen sequencing technology. However, errors introduced during sample collection, sample preparation, sequencing, and data analysis through computational methods may distort results and contribute to erroneous interpretations. In this work we present a set of studies that explore anomaly detection and error corrections in genomic data, from different points of view. Broadly the topics of the thesis could be grouped into two categories: those related to metagenomic, whereas the projects focus on accurate profiling of a microbiome community in terms of contamination identification, and false positive detection for taxonomic classification; and those related to viral genomics, whereas the projects focus on variant calling with high error rate long read sequencing, and cryptic mutation detection in wastewater for SARS-CoV-2. These methodologies underscore the significance of rare occurrences in high-throughput sequencing procedures, paving the way for advancements in metagenomics and viral genomics.en_US
dc.embargo.lift2026-05-01en_US
dc.embargo.terms2026-05-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLiu, Yunxi. Finding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasets. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116199en_US
dc.identifier.urihttps://hdl.handle.net/1911/116199en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectComputational biologyen_US
dc.titleFinding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasetsen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.97 KB
Format:
Plain Text
Description: