Parsnp 2.0: scalable core-genome alignment for massive microbial datasets

Kille, Bryce; Nute, Michael G; Huang, Victor; Kim, Eddie; Phillippy, Adam M; Treangen, Todd J

Parsnp 2.0: scalable core-genome alignment for massive microbial datasets

dc.citation.articleNumber	btae311	en_US
dc.citation.issueNumber	5	en_US
dc.citation.journalTitle	Bioinformatics	en_US
dc.citation.volumeNumber	40	en_US
dc.contributor.author	Kille, Bryce	en_US
dc.contributor.author	Nute, Michael G	en_US
dc.contributor.author	Huang, Victor	en_US
dc.contributor.author	Kim, Eddie	en_US
dc.contributor.author	Phillippy, Adam M	en_US
dc.contributor.author	Treangen, Todd J	en_US
dc.date.accessioned	2024-08-29T21:11:47Z	en_US
dc.date.available	2024-08-29T21:11:47Z	en_US
dc.date.issued	2024	en_US
dc.description.abstract	Since 2016, the number of microbial species with available reference genomes in NCBI has more than tripled. Multiple genome alignment, the process of identifying nucleotides across multiple genomes which share a common ancestor, is used as the input to numerous downstream comparative analysis methods. Parsnp is one of the few multiple genome alignment methods able to scale to the current era of genomic data; however, there has been no major release since its initial release in 2014.To address this gap, we developed Parsnp v2, which significantly improves on its original release. Parsnp v2 provides users with more control over executions of the program, allowing Parsnp to be better tailored for different use-cases. We introduce a partitioning option to Parsnp, which allows the input to be broken up into multiple parallel alignment processes which are then combined into a final alignment. The partitioning option can reduce memory usage by over 4× and reduce runtime by over 2×, all while maintaining a precise core-genome alignment. The partitioning workflow is also less susceptible to complications caused by assembly artifacts and minor variation, as alignment anchors only need to be conserved within their partition and not across the entire input set. We highlight the performance on datasets involving thousands of bacterial and viral genomes.Parsnp v2 is available at https://github.com/marbl/parsnp.	en_US
dc.identifier.citation	Kille, B., Nute, M. G., Huang, V., Kim, E., Phillippy, A. M., & Treangen, T. J. (2024). Parsnp 2.0: Scalable core-genome alignment for massive microbial datasets. Bioinformatics, 40(5), btae311. https://doi.org/10.1093/bioinformatics/btae311	en_US
dc.identifier.digital	btae311	en_US
dc.identifier.doi	https://doi.org/10.1093/bioinformatics/btae311	en_US
dc.identifier.uri	https://hdl.handle.net/1911/117730	en_US
dc.language.iso	eng	en_US
dc.publisher	Oxford University Press	en_US
dc.rights	Except where otherwise noted, this work is licensed under a Creative Commons Attribution (CC BY) license. Permission to reuse, publish, or reproduce the work beyond the terms of the license or beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.title	Parsnp 2.0: scalable core-genome alignment for massive microbial datasets	en_US
dc.type	Journal article	en_US
dc.type.dcmi	Text	en_US
dc.type.publication	publisher version	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: btae311.pdf
Size:: 1.05 MB
Format:: Adobe Portable Document Format

Download

Collections

Faculty Publications