RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification

dc.contributor.authorNasko, Daniel Jen_US
dc.contributor.authorKoren, Sergeyen_US
dc.contributor.authorPhillippy, Adam Men_US
dc.contributor.authorTreangen, Todd Jen_US
dc.date.accessioned2018-11-28T16:43:59Zen_US
dc.date.available2018-11-28T16:43:59Zen_US
dc.date.issued10/30/2018en_US
dc.date.updated2018-11-28T16:43:59Zen_US
dc.description.abstractAbstract In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.en_US
dc.identifier.citationNasko, Daniel J, Koren, Sergey, Phillippy, Adam M, et al.. "RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification." (2018) BioMed Central: https://doi.org/10.1186/s13059-018-1554-6.en_US
dc.identifier.doihttps://doi.org/10.1186/s13059-018-1554-6en_US
dc.identifier.urihttps://hdl.handle.net/1911/103430en_US
dc.language.isoengen_US
dc.publisherBioMed Centralen_US
dc.rightsThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en_US
dc.rights.holderThe Author(s).en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.titleRefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identificationen_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpublisher versionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
13059_2018_Article_1554.pdf
Size:
1 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: