Finding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasets

dc.contributor.advisorTreangen, Todd J
dc.creatorLiu, Yunxi
dc.date.accessioned2024-05-22T16:30:02Z
dc.date.created2024-05
dc.date.issued2024-04-17
dc.date.submittedMay 2024
dc.date.updated2024-05-22T16:30:02Z
dc.descriptionEMBARGO NOTE: This item is embargoed until 2026-05-01
dc.description.abstractThe scale and complexity of genomic studies have been expanded alongside the volume of sequencing data thanks to the recent development in next-gen and third-gen sequencing technology. However, errors introduced during sample collection, sample preparation, sequencing, and data analysis through computational methods may distort results and contribute to erroneous interpretations. In this work we present a set of studies that explore anomaly detection and error corrections in genomic data, from different points of view. Broadly the topics of the thesis could be grouped into two categories: those related to metagenomic, whereas the projects focus on accurate profiling of a microbiome community in terms of contamination identification, and false positive detection for taxonomic classification; and those related to viral genomics, whereas the projects focus on variant calling with high error rate long read sequencing, and cryptic mutation detection in wastewater for SARS-CoV-2. These methodologies underscore the significance of rare occurrences in high-throughput sequencing procedures, paving the way for advancements in metagenomics and viral genomics.
dc.embargo.lift2026-05-01
dc.embargo.terms2026-05-01
dc.format.mimetypeapplication/pdf
dc.identifier.citationLiu, Yunxi. Finding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasets. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116199
dc.identifier.urihttps://hdl.handle.net/1911/116199
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectComputational biology
dc.titleFinding Needles in the Haystack: Computational tools for Contaminant Detection and Error Correction in Genomic and Metagenomic Datasets
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.97 KB
Format:
Plain Text
Description: