Browsing by Author "Treangen, Todd J."
Now showing 1 - 14 of 14
Results Per Page
Sort Options
Item Crykey: Rapid identification of SARS-CoV-2 cryptic mutations in wastewater(Springer Nature, 2024) Liu, Yunxi; Sapoval, Nicolae; Gallego-García, Pilar; Tomás, Laura; Posada, David; Treangen, Todd J.; Stadler, Lauren B.Wastewater surveillance for SARS-CoV-2 provides early warnings of emerging variants of concerns and can be used to screen for novel cryptic linked-read mutations, which are co-occurring single nucleotide mutations that are rare, or entirely missing, in existing SARS-CoV-2 databases. While previous approaches have focused on specific regions of the SARS-CoV-2 genome, there is a need for computational tools capable of efficiently tracking cryptic mutations across the entire genome and investigating their potential origin. We present Crykey, a tool for rapidly identifying rare linked-read mutations across the genome of SARS-CoV-2. We evaluated the utility of Crykey on over 3,000 wastewater and over 22,000 clinical samples; our findings are three-fold: i) we identify hundreds of cryptic mutations that cover the entire SARS-CoV-2 genome, ii) we track the presence of these cryptic mutations across multiple wastewater treatment plants and over three years of sampling in Houston, and iii) we find a handful of cryptic mutations in wastewater mirror cryptic mutations in clinical samples and investigate their potential to represent real cryptic lineages. In summary, Crykey enables large-scale detection of cryptic mutations in wastewater that represent potential circulating cryptic lineages, serving as a new computational tool for wastewater surveillance of SARS-CoV-2.Item Current progress and future opportunities in applications of bioinformatics for biodefense and pathogen detection: report from the Winter Mid-Atlantic Microbiome Meet-up, College Park, MD, January 10, 2018(BioMed Central, 2018-11-05) Meisel, Jacquelyn S.; Nasko, Daniel J.; Brubach, Brian; Cepeda-Espinoza, Victoria; Chopyk, Jessica; Corrada-Bravo, Héctor; Fedarko, Marcus; Ghurye, Jay; Javkar, Kiran; Olson, Nathan D.; Shah, Nidhi; Allard, Sarah M.; Bazinet, Adam L.; Bergman, Nicholas H.; Brown, Alexis; Caporaso, J.G.; Conlan, Sean; DiRuggiero, Jocelyne; Forry, Samuel P.; Hasan, Nur A.; Kralj, Jason; Luethy, Paul M.; Milton, Donald K.; Ondov, Brian D.; Preheim, Sarah; Ratnayake, Shashikala; Rogers, Stephanie M.; Rosovitz, M.J.; Sakowski, Eric G.; Schliebs, Nils O.; Sommer, Daniel D.; Ternus, Krista L.; Uritskiy, Gherman; Zhang, Sean X.; Pop, Mihai; Treangen, Todd J.Abstract The Mid-Atlantic Microbiome Meet-up (M3) organization brings together academic, government, and industry groups to share ideas and develop best practices for microbiome research. In January of 2018, M3 held its fourth meeting, which focused on recent advances in biodefense, specifically those relating to infectious disease, and the use of metagenomic methods for pathogen detection. Presentations highlighted the utility of next-generation sequencing technologies for identifying and tracking microbial community members across space and time. However, they also stressed the current limitations of genomic approaches for biodefense, including insufficient sensitivity to detect low-abundance pathogens and the inability to quantify viable organisms. Participants discussed ways in which the community can improve software usability and shared new computational tools for metagenomic processing, assembly, annotation, and visualization. Looking to the future, they identified the need for better bioinformatics toolkits for longitudinal analyses, improved sample processing approaches for characterizing viruses and fungi, and more consistent maintenance of database resources. Finally, they addressed the necessity of improving data standards to incentivize data sharing. Here, we summarize the presentations and discussions from the meeting, identifying the areas where microbiome analyses have improved our ability to detect and manage biological threats and infectious disease, as well as gaps of knowledge in the field that require future funding and focus.Item Current progress and open challenges for applying deep learning across the biosciences(Springer Nature, 2022) Sapoval, Nicolae; Aghazadeh, Amirali; Nute, Michael G.; Antunes, Dinler A.; Balaji, Advait; Baraniuk, Richard; Barberan, C.J.; Dannenfelser, Ruth; Dun, Chen; Edrisi, Mohammadamin; Elworth, R.A. Leo; Kille, Bryce; Kyrillidis, Anastasios; Nakhleh, Luay; Wolfe, Cameron R.; Yan, Zhi; Yao, Vicky; Treangen, Todd J.; Bioengineering; Computer ScienceDeep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.Item De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee(Springer Nature, 2022) Liu, Yunxi; Elworth, R. A. Leo; Jochum, Michael D.; Aagaard, Kjersti M.; Treangen, Todd J.Computational analysis of host-associated microbiomes has opened the door to numerous discoveries relevant to human health and disease. However, contaminant sequences in metagenomic samples can potentially impact the interpretation of findings reported in microbiome studies, especially in low-biomass environments. Contamination from DNA extraction kits or sampling lab environments leaves taxonomic "bread crumbs" across multiple distinct sample types. Here we describe Squeegee, a de novo contamination detection tool that is based upon this principle, allowing the detection of microbial contaminants when negative controls are unavailable. On the low-biomass samples, we compare Squeegee predictions to experimental negative control data and show that Squeegee accurately recovers putative contaminants. We analyze samples of varying biomass from the Human Microbiome Project and identify likely, previously unreported kit contamination. Collectively, our results highlight that Squeegee can identify microbial contaminants with high precision and thus represents a computational approach for contaminant detection when negative controls are unavailable.Item Enabling accurate and early detection of recently emerged SARS-CoV-2 variants of concern in wastewater(Springer Nature, 2023) Sapoval, Nicolae; Liu, Yunxi; Lou, Esther G.; Hopkins, Loren; Ensor, Katherine B.; Schneider, Rebecca; Stadler, Lauren B.; Treangen, Todd J.As clinical testing declines, wastewater monitoring can provide crucial surveillance on the emergence of SARS-CoV-2 variant of concerns (VoCs) in communities. In this paper we present QuaID, a novel bioinformatics tool for VoC detection based on quasi-unique mutations. The benefits of QuaID are three-fold: (i) provides up to 3-week earlier VoC detection, (ii) accurate VoC detection (>95% precision on simulated benchmarks), and (iii) leverages all mutational signatures (including insertions & deletions).Item Fecal Microbiota Transplantation Derived from Alzheimer’s Disease Mice Worsens Brain Trauma Outcomes in Wild-Type Controls(MDPI, 2022) Soriano, Sirena; Curry, Kristen; Wang, Qi; Chow, Elsbeth; Treangen, Todd J.; Villapol, SoniaTraumatic brain injury (TBI) causes neuroinflammation and neurodegeneration, both of which increase the risk and accelerate the progression of Alzheimer’s disease (AD). The gut microbiome is an essential modulator of the immune system, impacting the brain. AD has been related with reduced diversity and alterations in the community composition of the gut microbiota. This study aimed to determine whether the gut microbiota from AD mice exacerbates neurological deficits after TBI in control mice. We prepared fecal microbiota transplants from 18 to 24 month old 3×Tg-AD (FMT-AD) and from healthy control (FMT-young) mice. FMTs were administered orally to young control C57BL/6 (wild-type, WT) mice after they underwent controlled cortical impact (CCI) injury, as a model of TBI. Then, we characterized the microbiota composition of the fecal samples by full-length 16S rRNA gene sequencing analysis. We collected the blood, brain, and gut tissues for protein and immunohistochemical analysis. Our results showed that FMT-AD administration stimulates a higher relative abundance of the genus Muribaculum and a decrease in Lactobacillus johnsonii compared to FMT-young in WT mice. Furthermore, WT mice exhibited larger lesion, increased activated microglia/macrophages, and reduced motor recovery after FMT-AD compared to FMT-young one day after TBI. In summary, we observed gut microbiota from AD mice to have a detrimental effect and aggravate the neuroinflammatory response and neurological outcomes after TBI in young WT mice.Item Improved understanding of biorisk for research involving microbial modification using annotated sequences of concern(Frontiers Media S.A., 2023) Godbold, Gene D.; Hewitt, F. Curtis; Kappell, Anthony D.; Scholz, Matthew B.; Agar, Stacy L.; Treangen, Todd J.; Ternus, Krista L.; Sandbrink, Jonas B.; Koblentz, Gregory D.Regulation of research on microbes that cause disease in humans has historically been focused on taxonomic lists of ‘bad bugs’. However, given our increased knowledge of these pathogens through inexpensive genome sequencing, 5 decades of research in microbial pathogenesis, and the burgeoning capacity of synthetic biologists, the limitations of this approach are apparent. With heightened scientific and public attention focused on biosafety and biosecurity, and an ongoing review by US authorities of dual-use research oversight, this article proposes the incorporation of sequences of concern (SoCs) into the biorisk management regime governing genetic engineering of pathogens. SoCs enable pathogenesis in all microbes infecting hosts that are ‘of concern’ to human civilization. Here we review the functions of SoCs (FunSoCs) and discuss how they might bring clarity to potentially problematic research outcomes involving infectious agents. We believe that annotation of SoCs with FunSoCs has the potential to improve the likelihood that dual use research of concern is recognized by both scientists and regulators before it occurs.Item Journey into the unknown: graph and machine learning based approaches for improved characterization of novel pathogens.(2023-02-27) Balaji, Advait; Treangen, Todd J.The advent of efficient high-throughput sequencing technologies has led to petabytes-scale genomic datasets. A significant contributor to this genomic data deluge is the field of metagenomics, which comprises the analysis of microbial communities from biological samples. Metagenomics has contributed to novel insights with respect to infectious disease spread and public health, however, scalable and accurate tools to identify and characterize sequences of interest (e.g. Pathogens) from metagenomic samples remain limited. In this thesis, we present three computational tools that encompass contributions towards novel pathogen detection via taxonomy-oblivious functional characterization of DNA sequences harbored within metagenomes. In the first part, we introduce SeqScreen, a tool that utilizes ensemble learning for sensitive functional screening of pathogenic sequences. We show that our ensemble classifier consisting of Neural Networks and Support Vector Classifiers can assign pathogenic labels known as Functions of Sequences of Concern (FunSoCs) to short read sequences. Our classifier achieves 90% precision and 82% recall on an imbalanced multi-class, multi-label classification task across 32 FunSoC labels. We highlight the advantages of FunSoCs over state-of-the-art taxonomic classifiers in distinguishing near-neighbor pathogens. We also simulate a novel-pathogen use-case and show that, in contrast to other tools, SeqScreen can sensitively detect trace amounts of SARS-CoV2 virus from a metagenomic sample obtained from COVID-19 patients. Second, we discuss KOMB, a software for reference-free characterization of function-rich Copy Number Variations (CNVs) in metagenomes. KOMB presents one of the first applications of K-core graph decomposition to metagenomes, thereby offering an exact O(Edges + Vertices) linear-time solution to identifying repeats in graph metagenomes in contrast to state-of-the-art betweenness centrality based tools. On a mock metagenome, KOMB offers more accurate detection of repeats across different copy numbers, offering a sample-wide characterization of CNVs. Using longitudinal metagenome data, we show that KOMB can be used to analyze and visualize shifts caused by disruptions. We also show that KOMB can identify sequences with potentially unique functional profiles using a previous anomaly detection method used to analyze social networks. Finally, we present SeqScreen-Nano, a tool for pathogen detection and identification in metagenomes using long read data. Using simulated nanopore reads from isolate genomes, we first show that the mapping stage of SeqScreen-Nano is optimized to accurately predict Open Reading Frames (ORFs) along the length of the raw nanopore read and accurately assign functional labels in comparison to other mappers and functional characterization tools. We also propose a majority voting approach and a greedy weighted minimum-set cover algorithm to predict a single taxonomic label per read. Further, we develop a reference inference pipeline that assigns a probabilistic coverage score based on ORF assignments to accurately predict species in two mock metagenomic communities and has higher precision and recall compared to state-of-the-art taxonomic classifiers. In summary, this thesis presents efficient and accurate software for pathogen detection and de-novo characterization of copy number variation. Our work presents novel computational frameworks and algorithmic applications that have the potential to have broad impacts across the scientific community ranging from clinical metagenomics to microbial forensics.Item MethPhaser: methylation-based long-read haplotype phasing of human genomes(Springer Nature, 2024) Fu, Yilei; Aganezov, Sergey; Mahmoud, Medhat; Beaulaurier, John; Juul, Sissel; Treangen, Todd J.; Sedlazeck, Fritz J.; Bioengineering; Computer ScienceThe assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing. We demonstrate that haplotype-specific methylations extensively exist in Human genomes and the advent of long-read technologies enabled direct report of methylation signals. For ONT R9 and R10 cell line data, we increase the phase length N50 by 78%-151% at a phasing accuracy of 83.4-98.7% To assess the impact of tissue purity and random methylation signals due to inactivation, we also applied MethPhaser on blood samples from 4 patients, still showing improvements over SNV-only phasing. MethPhaser further improves phasing across HLA and multiple other medically relevant genes, improving our understanding of how mutations interact across multiple phenotypes. The concept of MethPhaser can also be extended to non-human diploid genomes. MethPhaser is available at https://github.com/treangenlab/methphaser.Item Multiple genome alignment in the telomere-to-telomere assembly era(Springer Nature, 2022) Kille, Bryce; Balaji, Advait; Sedlazeck, Fritz J.; Nute, Michael; Treangen, Todd J.With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.Item Olivar: towards automated variant aware primer design for multiplex tiled amplicon sequencing of pathogens(Springer Nature, 2024) Wang, Michael X.; Lou, Esther G.; Sapoval, Nicolae; Kim, Eddie; Kalvapalle, Prashant; Kille, Bryce; Elworth, R. A. Leo; Liu, Yunxi; Fu, Yilei; Stadler, Lauren B.; Treangen, Todd J.; Bioengineering; Civil and Environmental Engineering; Computer ScienceTiled amplicon sequencing has served as an essential tool for tracking the spread and evolution of pathogens. Over 15 million complete SARS-CoV-2 genomes are now publicly available, most sequenced and assembled via tiled amplicon sequencing. While computational tools for tiled amplicon design exist, they require downstream manual optimization both computationally and experimentally, which is slow and costly. Here we present Olivar, a first step towards a fully automated, variant-aware design of tiled amplicons for pathogen genomes. Olivar converts each nucleotide of the target genome into a numeric risk score, capturing undesired sequence features that should be avoided. In a direct comparison with PrimalScheme, we show that Olivar has fewer mismatches overlapping with primers and predicted PCR byproducts. We also compare Olivar head-to-head with ARTIC v4.1, the most widely used primer set for SARS-CoV-2 sequencing, and show Olivar yields similar read mapping rates (~90%) and better coverage to the manually designed ARTIC v4.1 amplicons. We also evaluate Olivar on real wastewater samples and found that Olivar has up to 3-fold higher mapping rates while retaining similar coverage. In summary, Olivar automates and accelerates the generation of tiled amplicons, even in situations of high mutation frequency and/or density. Olivar is available online as a web application at https://olivar.rice.edu and can be installed locally as a command line tool with Bioconda. Source code, installation guide, and usage are available at https://github.com/treangenlab/Olivar.Item Rescuing low frequency variants within intra-host viral populations directly from Oxford Nanopore sequencing data(Springer Nature, 2022) Liu, Yunxi; Kearney, Joshua; Mahmoud, Medhat; Kille, Bryce; Sedlazeck, Fritz J.; Treangen, Todd J.Infectious disease monitoring on Oxford Nanopore Technologies (ONT) platforms offers rapid turnaround times and low cost. Tracking low frequency intra-host variants provides important insights with respect to elucidating within-host viral population dynamics and transmission. However, given the higher error rate of ONT, accurate identification of intra-host variants with low allele frequencies remains an open challenge with no viable computational solutions available. In response to this need, we present Variabel, a novel approach and first method designed for rescuing low frequency intra-host variants from ONT data alone. We evaluate Variabel on both synthetic data (SARS-CoV-2) and patient derived datasets (Ebola virus, norovirus, SARS-CoV-2); our results show that Variabel can accurately identify low frequency variants below 0.5 allele frequency, outperforming existing state-of-the-art ONT variant callers for this task. Variabel is open-source and available for download at: www.gitlab.com/treangenlab/variabel.Item Role of miR-2392 in driving SARS-CoV-2 infection(Elsevier, 2021) McDonald, J. Tyson; Enguita, Francisco J.; Taylor, Deanne; Griffin, Robert J.; Priebe, Waldemar; Emmett, Mark R.; Sajadi, Mohammad M.; Harris, Anthony D.; Clement, Jean; Dybas, Joseph M.; Aykin-Burns, Nukhet; Guarnieri, Joseph W.; Singh, Larry N.; Grabham, Peter; Baylin, Stephen B.; Yousey, Aliza; Pearson, Andrea N.; Corry, Peter M.; Saravia-Butler, Amanda; Aunins, Thomas R.; Sharma, Sadhana; Nagpal, Prashant; Meydan, Cem; Foox, Jonathan; Mozsary, Christopher; Cerqueira, Bianca; Zaksas, Viktorija; Singh, Urminder; Wurtele, Eve Syrkin; Costes, Sylvain V.; Davanzo, Gustavo Gastão; Galeano, Diego; Paccanaro, Alberto; Meinig, Suzanne L.; Hagan, Robert S.; Bowman, Natalie M.; Wallet, Shannon M.; Maile, Robert; Wolfgang, Matthew C.; Hagan, Robert S.; Mock, Jason R.; Bowman, Natalie M.; Torres-Castillo, Jose L.; Love, Miriya K.; Meinig, Suzanne L.; Lovell, Will; Rice, Colleen; Mitchem, Olivia; Burgess, Dominique; Suggs, Jessica; Jacobs, Jordan; Wolfgang, Matthew C.; Altinok, Selin; Sapoval, Nicolae; Treangen, Todd J.; Moraes-Vieira, Pedro M.; Vanderburg, Charles; Wallace, Douglas C.; Schisler, Jonathan C.; Mason, Christopher E.; Chatterjee, Anushree; Meller, Robert; Beheshti, AfshinMicroRNAs (miRNAs) are small non-coding RNAs involved in post-transcriptional gene regulation that have a major impact on many diseases and provide an exciting avenue toward antiviral therapeutics. From patient transcriptomic data, we determined that a circulating miRNA, miR-2392, is directly involved with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) machinery during host infection. Specifically, we show that miR-2392 is key in driving downstream suppression of mitochondrial gene expression, increasing inflammation, glycolysis, and hypoxia, as well as promoting many symptoms associated with coronavirus disease 2019 (COVID-19) infection. We demonstrate that miR-2392 is present in the blood and urine of patients positive for COVID-19 but is not present in patients negative for COVID-19. These findings indicate the potential for developing a minimally invasive COVID-19 detection method. Lastly, using in vitro human and in vivo hamster models, we design a miRNA-based antiviral therapeutic that targets miR-2392, significantly reduces SARS-CoV-2 viability in hamsters, and may potentially inhibit a COVID-19 disease state in humans.Item SARS-CoV-2 genomic diversity and the implications for qRT-PCR diagnostics and transmission(Cold Spring Harbor Laboratory Press, 2021) Sapoval, Nicolae; Mahmoud, Medhat; Jochum, Michael D.; Liu, Yunxi; Elworth, R. A. Leo; Wang, Qi; Albin, Dreycey; Ogilvie, Huw A.; Lee, Michael D.; Villapol, Sonia; Hernandez, Kyle M.; Berry, Irina Maljkovic; Foox, Jonathan; Beheshti, Afshin; Ternus, Krista; Aagaard, Kjersti M.; Posada, David; Mason, Christopher E.; Sedlazeck, Fritz J.; Treangen, Todd J.The COVID-19 pandemic has sparked an urgent need to uncover the underlying biology of this devastating disease. Though RNA viruses mutate more rapidly than DNA viruses, there are a relatively small number of single nucleotide polymorphisms (SNPs) that differentiate the main SARS-CoV-2 lineages that have spread throughout the world. In this study, we investigated 129 RNA-seq data sets and 6928 consensus genomes to contrast the intra-host and inter-host diversity of SARS-CoV-2. Our analyses yielded three major observations. First, the mutational profile of SARS-CoV-2 highlights intra-host single nucleotide variant (iSNV) and SNP similarity, albeit with differences in C > U changes. Second, iSNV and SNP patterns in SARS-CoV-2 are more similar to MERS-CoV than SARS-CoV-1. Third, a significant fraction of insertions and deletions contribute to the genetic diversity of SARS-CoV-2. Altogether, our findings provide insight into SARS-CoV-2 genomic diversity, inform the design of detection tests, and highlight the potential of iSNVs for tracking the transmission of SARS-CoV-2.