Browsing by Author "Olson, Nathan D."
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Current progress and future opportunities in applications of bioinformatics for biodefense and pathogen detection: report from the Winter Mid-Atlantic Microbiome Meet-up, College Park, MD, January 10, 2018(BioMed Central, 2018-11-05) Meisel, Jacquelyn S.; Nasko, Daniel J.; Brubach, Brian; Cepeda-Espinoza, Victoria; Chopyk, Jessica; Corrada-Bravo, Héctor; Fedarko, Marcus; Ghurye, Jay; Javkar, Kiran; Olson, Nathan D.; Shah, Nidhi; Allard, Sarah M.; Bazinet, Adam L.; Bergman, Nicholas H.; Brown, Alexis; Caporaso, J.G.; Conlan, Sean; DiRuggiero, Jocelyne; Forry, Samuel P.; Hasan, Nur A.; Kralj, Jason; Luethy, Paul M.; Milton, Donald K.; Ondov, Brian D.; Preheim, Sarah; Ratnayake, Shashikala; Rogers, Stephanie M.; Rosovitz, M.J.; Sakowski, Eric G.; Schliebs, Nils O.; Sommer, Daniel D.; Ternus, Krista L.; Uritskiy, Gherman; Zhang, Sean X.; Pop, Mihai; Treangen, Todd J.Abstract The Mid-Atlantic Microbiome Meet-up (M3) organization brings together academic, government, and industry groups to share ideas and develop best practices for microbiome research. In January of 2018, M3 held its fourth meeting, which focused on recent advances in biodefense, specifically those relating to infectious disease, and the use of metagenomic methods for pathogen detection. Presentations highlighted the utility of next-generation sequencing technologies for identifying and tracking microbial community members across space and time. However, they also stressed the current limitations of genomic approaches for biodefense, including insufficient sensitivity to detect low-abundance pathogens and the inability to quantify viable organisms. Participants discussed ways in which the community can improve software usability and shared new computational tools for metagenomic processing, assembly, annotation, and visualization. Looking to the future, they identified the need for better bioinformatics toolkits for longitudinal analyses, improved sample processing approaches for characterizing viruses and fungi, and more consistent maintenance of database resources. Finally, they addressed the necessity of improving data standards to incentivize data sharing. Here, we summarize the presentations and discussions from the meeting, identifying the areas where microbiome analyses have improved our ability to detect and manage biological threats and infectious disease, as well as gaps of knowledge in the field that require future funding and focus.Item The GIAB genomic stratifications resource for human reference genomes(Springer Nature, 2024) Dwarshuis, Nathan; Kalra, Divya; McDaniel, Jennifer; Sanio, Philippe; Alvarez Jerez, Pilar; Jadhav, Bharati; Huang, Wenyu (Eddy); Mondal, Rajarshi; Busby, Ben; Olson, Nathan D.; Sedlazeck, Fritz J.; Wagner, Justin; Majidian, Sina; Zook, Justin M.Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of “stratifications,” which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications. We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.Item StratoMod: predicting sequencing and variant calling errors with interpretable machine learning(Springer Nature, 2024) Dwarshuis, Nathan; Tonner, Peter; Olson, Nathan D.; Sedlazeck, Fritz J.; Wagner, Justin; Zook, Justin M.Despite the variety in sequencing platforms, mappers, and variant callers, no single pipeline is optimal across the entire human genome. Therefore, developers, clinicians, and researchers need to make tradeoffs when designing pipelines for their application. Currently, assessing such tradeoffs relies on intuition about how a certain pipeline will perform in a given genomic context. We present StratoMod, which addresses this problem using an interpretable machine-learning classifier to predict germline variant calling errors in a data-driven manner. We show StratoMod can precisely predict recall using Hifi or Illumina and leverage StratoMod’s interpretability to measure contributions from difficult-to-map and homopolymer regions for each respective outcome. Furthermore, we use Statomod to assess the effect of mismapping on predicted recall using linear vs. graph-based references, and identify the hard-to-map regions where graph-based methods excelled and by how much. For these we utilize our draft benchmark based on the Q100 HG002 assembly, which contains previously-inaccessible difficult regions. Furthermore, StratoMod presents a new method of predicting clinically relevant variants likely to be missed, which is an improvement over current pipelines which only filter variants likely to be false. We anticipate this being useful for performing precise risk-reward analyses when designing variant calling pipelines.