Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation

dc.citation.articleNumberbtad512en_US
dc.citation.issueNumber9en_US
dc.citation.journalTitleBioinformaticsen_US
dc.citation.volumeNumber39en_US
dc.contributor.authorKille, Bryceen_US
dc.contributor.authorGarrison, Eriken_US
dc.contributor.authorTreangen, Todd Jen_US
dc.contributor.authorPhillippy, Adam Men_US
dc.date.accessioned2024-05-08T18:56:12Zen_US
dc.date.available2024-05-08T18:56:12Zen_US
dc.date.issued2023en_US
dc.description.abstractThe Jaccard similarity on k-mer sets has shown to be a convenient proxy for sequence identity. By avoiding expensive base-level alignments and comparing reduced sequence representations, tools such as MashMap can scale to massive numbers of pairwise comparisons while still providing useful similarity estimates. However, due to their reliance on minimizer winnowing, previous versions of MashMap were shown to be biased and inconsistent estimators of Jaccard similarity. This directly impacts downstream tools that rely on the accuracy of these estimates.To address this, we propose the minmer winnowing scheme, which generalizes the minimizer scheme by use of a rolling minhash with multiple sampled k-mers per window. We show both theoretically and empirically that minmers yield an unbiased estimator of local Jaccard similarity, and we implement this scheme in an updated version of MashMap. The minmer-based implementation is over 10 times faster than the minimizer-based version under the default ANI threshold, making it well-suited for large-scale comparative genomics applications.MashMap3 is available at https://github.com/marbl/MashMap.en_US
dc.identifier.citationKille, B., Garrison, E., Treangen, T. J., & Phillippy, A. M. (2023). Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation. Bioinformatics, 39(9), btad512. https://doi.org/10.1093/bioinformatics/btad512en_US
dc.identifier.digitalbtad512en_US
dc.identifier.doihttps://doi.org/10.1093/bioinformatics/btad512en_US
dc.identifier.urihttps://hdl.handle.net/1911/115693en_US
dc.language.isoengen_US
dc.publisherOxford University Pressen_US
dc.rightsExcept where otherwise noted, this work is licensed under a Creative Commons Attribution (CC BY) license. Permission to reuse, publish, or reproduce the work beyond the terms of the license or beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.titleMinmers are a generalization of minimizers that enable unbiased local Jaccard estimationen_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpublisher versionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
btad512.pdf
Size:
1.35 MB
Format:
Adobe Portable Document Format