GenomeDepot: Computational Methods for Decoding Biological Information Encoded in Engineered DNA and Microbial Genomes

dc.contributor.advisorTreangen, Todden_US
dc.creatorWang, Qi Xen_US
dc.date.accessioned2021-12-06T19:38:57Zen_US
dc.date.available2021-12-06T19:38:57Zen_US
dc.date.created2021-12en_US
dc.date.issued2021-12-03en_US
dc.date.submittedDecember 2021en_US
dc.date.updated2021-12-06T19:38:58Zen_US
dc.description.abstractAlthough great successes have been made in DNA sequencing and genome engineering, fully elucidating the underlying biological information encoded in genomic data, and the ability to fully control biological systems, are still limited. My research has focused on deciphering signatures hidden in genomic data, specifically in engineered synthetic sequences, and metagenomes. Recent advances in genome engineering and editing have enabled researchers to create novel genetic parts and redesign biological systems. As genome engineering develops, there is a heightened awareness of potential misuse related to biosafety concerns. In parallel, we are now able to study microbial communities at unprecedented resolution thanks to metagenomics. Previous efforts in this area allow us to identify species composition and estimate their metabolic functions of given microbial communities. Despite this great progress, low-level knowledge of bacteria driving microbial interactions within microbiomes remains unknown, limiting our ability to fully understand and control microbial communities. In the first part of my thesis, I developed PlasmidHawk, a linear time pan-genome alignment-based pipeline to predict the lab-of-origin of unknown sequences. Compared to the previous deep learning method, PlasmidHawk has higher prediction accuracy. PlasmidHawk can successfully predict unknown sequences’ depositing labs 76% of the time and 85% of the time the correct lab is in the top 10 candidates. In addition, PlasmidHawk can precisely single out the signature sub-sequences that are responsible for the lab-of-origin detection. PlasmidHawk represents an explainable and accurate tool for lab-of-origin prediction of synthetic plasmid sequences. In the second part of my thesis, I developed Bakdrive, a novel method for identifying driver species within microbiomes. Bakdrive has three key innovations in this space: (i) it leverages inherent information from metagenomic sequencing samples to identify driver species, (ii) it explicitly takes host-specific variation into consideration, and (iii) it does not require a known ecological network. Through simulated and real dataset, we demonstrate detecting driver species from healthy donor samples and introducing them to the disease samples, we can restore the gut microbiome in recurrent Clostridioides difficile infection patients to a healthy state. In summary, Bakdrive provides a novel approach for teasing apart microbial interactions and facilitates future personalized probiotic design. In conclusion, GenomeDepot represents a collection of novel, computationally efficient software tools and algorithms suited for deciphering biological information encoded in engineered and microbial genomes. Real-world applications of GenomeDepot have included lab-of-origin prediction and detection of driver species in healthy and disease associated microbiomes, feeding back into biosecurity decisions and human health.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationWang, Qi X. "GenomeDepot: Computational Methods for Decoding Biological Information Encoded in Engineered DNA and Microbial Genomes." (2021) Diss., Rice University. <a href="https://hdl.handle.net/1911/111741">https://hdl.handle.net/1911/111741</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/111741en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectmetagenomeen_US
dc.subjectsynthetic biologyen_US
dc.subjectlab-of-originen_US
dc.titleGenomeDepot: Computational Methods for Decoding Biological Information Encoded in Engineered DNA and Microbial Genomesen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentSystems, Synthetic and Physical Biologyen_US
thesis.degree.disciplineNatural Sciencesen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WANG-DOCUMENT-2021.pdf
Size:
11.15 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: