The GIAB genomic stratifications resource for human reference genomes

dc.citation.articleNumber9029en_US
dc.citation.issueNumber1en_US
dc.citation.journalTitleNature Communicationsen_US
dc.citation.volumeNumber15en_US
dc.contributor.authorDwarshuis, Nathanen_US
dc.contributor.authorKalra, Divyaen_US
dc.contributor.authorMcDaniel, Jenniferen_US
dc.contributor.authorSanio, Philippeen_US
dc.contributor.authorAlvarez Jerez, Pilaren_US
dc.contributor.authorJadhav, Bharatien_US
dc.contributor.authorHuang, Wenyu (Eddy)en_US
dc.contributor.authorMondal, Rajarshien_US
dc.contributor.authorBusby, Benen_US
dc.contributor.authorOlson, Nathan D.en_US
dc.contributor.authorSedlazeck, Fritz J.en_US
dc.contributor.authorWagner, Justinen_US
dc.contributor.authorMajidian, Sinaen_US
dc.contributor.authorZook, Justin M.en_US
dc.date.accessioned2024-11-20T15:52:06Zen_US
dc.date.available2024-11-20T15:52:06Zen_US
dc.date.issued2024en_US
dc.description.abstractDespite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of “stratifications,” which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications. We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.en_US
dc.identifier.citationDwarshuis, N., Kalra, D., McDaniel, J., Sanio, P., Alvarez Jerez, P., Jadhav, B., Huang, W. (Eddy), Mondal, R., Busby, B., Olson, N. D., Sedlazeck, F. J., Wagner, J., Majidian, S., & Zook, J. M. (2024). The GIAB genomic stratifications resource for human reference genomes. Nature Communications, 15(1), 9029. https://doi.org/10.1038/s41467-024-53260-yen_US
dc.identifier.digitals41467-024-53260-yen_US
dc.identifier.doihttps://doi.org/10.1038/s41467-024-53260-yen_US
dc.identifier.urihttps://hdl.handle.net/1911/118063en_US
dc.language.isoengen_US
dc.publisherSpringer Natureen_US
dc.rightsExcept where otherwise noted, this work is licensed under a Creative Commons Attribution (CC BY) license. Permission to reuse, publish, or reproduce the work beyond the terms of the license or beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.titleThe GIAB genomic stratifications resource for human reference genomesen_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpublisher versionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s41467-024-53260-y.pdf
Size:
1.84 MB
Format:
Adobe Portable Document Format