Browsing by Author "Treangen, Todd"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item Analysis of bronchoalveolar lavage fluid metatranscriptomes among patients with COVID-19 disease(Springer Nature, 2022) Jochum, Michael; Lee, Michael D.; Curry, Kristen; Zaksas, Victoria; Vitalis, Elizabeth; Treangen, Todd; Aagaard, Kjersti; Ternus, Krista L.To better understand the potential relationship between COVID-19 disease and hologenome microbial community dynamics and functional profiles, we conducted a multivariate taxonomic and functional microbiome comparison of publicly available human bronchoalveolar lavage fluid (BALF) metatranscriptome samples amongst COVID-19 (n = 32), community acquired pneumonia (CAP) (n = 25), and uninfected samples (n = 29). We then performed a stratified analysis based on mortality amongst the COVID-19 cohort with known outcomes of deceased (n = 10) versus survived (n = 15). Our overarching hypothesis was that there are detectable and functionally significant relationships between BALF microbial metatranscriptomes and the severity of COVID-19 disease onset and progression. We observed 34 functionally discriminant gene ontology (GO) terms in COVID-19 disease compared to the CAP and uninfected cohorts, and 21 GO terms functionally discriminant to COVID-19 mortality (q < 0.05). GO terms enriched in the COVID-19 disease cohort included hydrolase activity, and significant GO terms under the parental terms of biological regulation, viral process, and interspecies interaction between organisms. Notable GO terms associated with COVID-19 mortality included nucleobase-containing compound biosynthetic process, organonitrogen compound catabolic process, pyrimidine-containing compound biosynthetic process, and DNA recombination, RNA binding, magnesium and zinc ion binding, oxidoreductase activity, and endopeptidase activity. A Dirichlet multinomial mixtures clustering analysis resulted in a best model fit using three distinct clusters that were significantly associated with COVID-19 disease and mortality. We additionally observed discriminant taxonomic differences associated with COVID-19 disease and mortality in the genus Sphingomonas, belonging to the Sphingomonadacae family, Variovorax, belonging to the Comamonadaceae family, and in the class Bacteroidia, belonging to the order Bacteroidales. To our knowledge, this is the first study to evaluate significant differences in taxonomic and functional signatures between BALF metatranscriptomes from COVID-19, CAP, and uninfected cohorts, as well as associating these taxa and microbial gene functions with COVID-19 mortality. Collectively, while this data does not speak to causality nor directionality of the association, it does demonstrate a significant relationship between the human microbiome and COVID-19. The results from this study have rendered testable hypotheses that warrant further investigation to better understand the causality and directionality of host–microbiome–pathogen interactions.Item Comprehensive analysis and accurate quantification of unintended large gene modifications induced by CRISPR-Cas9 gene editing(AAAS, 2022) Park, So Hyun; Cao, Mingming; Pan, Yidan; Davis, Timothy H.; Saxena, Lavanya; Deshmukh, Harshavardhan; Fu, Yilei; Treangen, Todd; Sheehan, Vivien A.; Bao, GangMost genome editing analyses to date are based on quantifying small insertions and deletions. Here, we show that CRISPR-Cas9 genome editing can induce large gene modifications, such as deletions, insertions, and complex local rearrangements in different primary cells and cell lines. We analyzed large deletion events in hematopoietic stem and progenitor cells (HSPCs) using different methods, including clonal genotyping, droplet digital polymerase chain reaction, single-molecule real-time sequencing with unique molecular identifier, and long-amplicon sequencing assay. Our results show that large deletions of up to several thousand bases occur with high frequencies at the Cas9 on-target cut sites on the HBB (11.7 to 35.4%), HBG (14.3%), and BCL11A (13.2%) genes in HSPCs and the PD-1 (15.2%) gene in T cells. Our findings have important implications to advancing genome editing technologies for treating human diseases, because unintended large gene modifications may persist, thus altering the biological functions and reducing the available therapeutic alleles.Item GenomeDepot: Computational Methods for Decoding Biological Information Encoded in Engineered DNA and Microbial Genomes(2021-12-03) Wang, Qi X; Treangen, ToddAlthough great successes have been made in DNA sequencing and genome engineering, fully elucidating the underlying biological information encoded in genomic data, and the ability to fully control biological systems, are still limited. My research has focused on deciphering signatures hidden in genomic data, specifically in engineered synthetic sequences, and metagenomes. Recent advances in genome engineering and editing have enabled researchers to create novel genetic parts and redesign biological systems. As genome engineering develops, there is a heightened awareness of potential misuse related to biosafety concerns. In parallel, we are now able to study microbial communities at unprecedented resolution thanks to metagenomics. Previous efforts in this area allow us to identify species composition and estimate their metabolic functions of given microbial communities. Despite this great progress, low-level knowledge of bacteria driving microbial interactions within microbiomes remains unknown, limiting our ability to fully understand and control microbial communities. In the first part of my thesis, I developed PlasmidHawk, a linear time pan-genome alignment-based pipeline to predict the lab-of-origin of unknown sequences. Compared to the previous deep learning method, PlasmidHawk has higher prediction accuracy. PlasmidHawk can successfully predict unknown sequences’ depositing labs 76% of the time and 85% of the time the correct lab is in the top 10 candidates. In addition, PlasmidHawk can precisely single out the signature sub-sequences that are responsible for the lab-of-origin detection. PlasmidHawk represents an explainable and accurate tool for lab-of-origin prediction of synthetic plasmid sequences. In the second part of my thesis, I developed Bakdrive, a novel method for identifying driver species within microbiomes. Bakdrive has three key innovations in this space: (i) it leverages inherent information from metagenomic sequencing samples to identify driver species, (ii) it explicitly takes host-specific variation into consideration, and (iii) it does not require a known ecological network. Through simulated and real dataset, we demonstrate detecting driver species from healthy donor samples and introducing them to the disease samples, we can restore the gut microbiome in recurrent Clostridioides difficile infection patients to a healthy state. In summary, Bakdrive provides a novel approach for teasing apart microbial interactions and facilitates future personalized probiotic design. In conclusion, GenomeDepot represents a collection of novel, computationally efficient software tools and algorithms suited for deciphering biological information encoded in engineered and microbial genomes. Real-world applications of GenomeDepot have included lab-of-origin prediction and detection of driver species in healthy and disease associated microbiomes, feeding back into biosecurity decisions and human health.Item Infectious Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in Exhaled Aerosols and Efficacy of Masks During Early Mild Infection(Oxford University Press, 2022) Adenaiye, Oluwasanmi O.; Lai, Jianyu; Bueno de Mesquita, P. Jacob; Hong, Filbert; Youssefi, Somayeh; German, Jennifer; Tai, S.H. Sheldon; Albert, Barbara; Schanz, Maria; Weston, Stuart; Hang, Jun; Fung, Christian; Chung, Hye Kyung; Coleman, Kristen K.; Sapoval, Nicolae; Treangen, Todd; Berry, Irina Maljkovic; Mullins, Kristin; Frieman, Matthew; Ma, Tianzhou; Milton, Donald K.; University of Maryland StopCOVID Research GroupSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemiology implicates airborne transmission; aerosol infectiousness and impacts of masks and variants on aerosol shedding are not well understood.We recruited coronavirus disease 2019 (COVID-19) cases to give blood, saliva, mid-turbinate and fomite (phone) swabs, and 30-minute breath samples while vocalizing into a Gesundheit-II, with and without masks at up to 2 visits 2 days apart. We quantified and sequenced viral RNA, cultured virus, and assayed serum samples for anti-spike and anti-receptor binding domain antibodies.We enrolled 49 seronegative cases (mean days post onset 3.8 ± 2.1), May 2020 through April 2021. We detected SARS-CoV-2 RNA in 36% of fine (≤5 µm), 26% of coarse (>5 µm) aerosols, and 52% of fomite samples overall and in all samples from 4 alpha variant cases. Masks reduced viral RNA by 48% (95% confidence interval [CI], 3 to 72%) in fine and by 77% (95% CI, 51 to 89%) in coarse aerosols; cloth and surgical masks were not significantly different. The alpha variant was associated with a 43-fold (95% CI, 6.6- to 280-fold) increase in fine aerosol viral RNA, compared with earlier viruses, that remained a significant 18-fold (95% CI, 3.4- to 92-fold) increase adjusting for viral RNA in saliva, swabs, and other potential confounders. Two fine aerosol samples, collected while participants wore masks, were culture-positive.SARS-CoV-2 is evolving toward more efficient aerosol generation and loose-fitting masks provide significant but only modest source control. Therefore, until vaccination rates are very high, continued layered controls and tight-fitting masks and respirators will be necessary.Item MetaCarvel: linking assembly graph motifs to biological variants(Biomed Central, 2019) Ghurye, Jay; Treangen, Todd; Fedarko, Marcus; Hervey, W. Judson; Pop, MihaiReconstructing genomic segments from metagenomics data is a highly complex task. In addition to general challenges, such as repeats and sequencing errors, metagenomic assembly needs to tolerate the uneven depth of coverage among organisms in a community and differences between nearly identical strains. Previous methods have addressed these issues by smoothing genomic variants. We present a variant-aware metagenomic scaffolder called MetaCarvel, which combines new strategies for repeat detection with graph analytics for the discovery of variants. We show that MetaCarvel can accurately reconstruct genomic segments from complex microbial mixtures and correctly identify and characterize several classes of common genomic variants.Item Natural tannin extracts supplementation for COVID-19 patients (TanCOVID): a structured summary of a study protocol for a randomized controlled trial(Springer Nature, 2021) Molino, Silvia; Pisarevsky, Andrea; Mingorance, Fabiana Lopez; Vega, Patricia; Stefanolo, Juan Pablo; Repetti, Julieta; Ludueña, Guillermina; Pepa, Pablo; Olmos, Juan Ignacio; Fermepin, Marcelo Rodriguez; Uehara, Tatiana; Villapol, Sonia; Savidge, Tor; Treangen, Todd; Viciani, Elisa; Castagnetti, Andrea; Piskorz, Maria MartaThis research aims to study the efficacy of tannins co-supplementation on disease duration, severity and clinical symptoms, microbiota composition and inflammatory mediators in SARS-CoV2 patients.Item Simultaneous SNV calling and Phylogenetic Inference for Single-cell Sequencing Data(2020-11-06) Edrisi, Mohammadamin; Nakhleh , Luay; Shrivastava, Anshumali; Treangen, Todd; Zafar, HamimSingle-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout, and non-uniform coverage. Variant calling in this context is the task of identifying mutations in the genomes of individual cells while accounting for the multiple types of errors. One powerful approach for solving this task computationally is to rely on a phylogenetic context, since the genomes under analysis evolved from a common ancestor along the branches of a tree. The phylogenetic tree captures the temporal dependencies across the genomes and provides an important constraint that allows to distinguish true mutations from error that masquerades as mutation. However, this approach of simultaneously identifying mutations while accounting for the phylogenetic constraints is computationally challenging. In this thesis, I report on a new method that I developed, called scVILP, that jointly detects mutations in individual cells and reconstructs a “perfect phylogeny” of the cells (a phylogeny on which every site in the genomes mutates at most once). The method employs a novel Integer Linear Programming (ILP) formulation and utilizes publicly available ILP solvers. Furthermore, to address the scalability issue, I developed a divide-and-conquer technique, where the ILP formulation is applied to and solved on subsets of the data, and the results are combined while resolving conflicts via constraints that are also formulated in terms of ILP. I demonstrate through analysis of simulated data sets that my method has accuracy that is similar to or better than that of existing methods, and has significantly better runtime. My method provides a promising approach for analyzing large single-cell genomic data sets.Item Synthetic DNA and biosecurity: Nuances of predicting pathogenicity and the impetus for novel computational approaches for screening oligonucleotides(Public Library of Science, 2020) Elworth, R.A. Leo; Diaz, Christian; Yang, Jing; de Figueiredo, Paul; Ternus, Krista; Treangen, ToddItem VariPhyer: A Modular Computational Platform for Verifying Microbial Variant Calling and Phylogenomic Analyses(2022-04-22) Liao, Chunxiao; Treangen, ToddThe COVID-19 pandemic has forever highlighted that the inference of whole-genome phylogeny, or phylogenomics, is critical for studying the evolution and transmission of infectious diseases. Furthermore, in phylogenomic analyses, deciding on which workflow to use, and what results to trust, is a critical open research question. Reproducible, explainable, and accurate microbial genomics analysis pipelines with comprehensive benchmarking of a known ground truth are an urgent need. Here, we propose a benchmarking pipeline, VariPhyer, an end-to-end, comprehensive framework for microbial benchmarking of phylogenetic inference and variant calling from short reads, long reads, and assembled genomes, all with best practices. VariPhyer was implemented in Nextflow and uses simulated genomic variants and evolutionary relationships as ground truth. The main idea behind VariPhyer is to provide a proving ground and evaluative framework for phylogenomics based on genome alignment and variant calling; any given approach should be close to the simulated ground truth if there is no error in the pipeline. VariPhyer simulates variants in the given genome given a phylogenetic tree, then uses the selected pipeline and evaluation matrix to compare the difference in tree comparison and variant calling accuracy. To test the correctness of our implementation and our hypothesis, we designed and experimented with simulated phylogenies and variants in a bacterial genome backbone. We evaluated the output phylogenies by comparing the tree differences with the designed tree as the loss function. Our hypothesis has been tested by the consistency between the designed and the output phylogeny across most pipelines in VariPhyer. VariPhyer provides loss functions to evaluate every tool statistically, regardless of species or sequence platform. Trees with different topology, branch lengths, and taxa numbers have been tested with the pipeline. The results indicate our pipeline is accurate and efficient for phylogenomic analysis and evaluation. In summary, VariPhyer is an open-source ``push-button'' phylogenomic processing and evaluation pipeline, representing a first step towards verified infectious disease analysis.