RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification
dc.contributor.author | Nasko, Daniel J | en_US |
dc.contributor.author | Koren, Sergey | en_US |
dc.contributor.author | Phillippy, Adam M | en_US |
dc.contributor.author | Treangen, Todd J | en_US |
dc.date.accessioned | 2018-11-28T16:43:59Z | en_US |
dc.date.available | 2018-11-28T16:43:59Z | en_US |
dc.date.issued | 10/30/2018 | en_US |
dc.date.updated | 2018-11-28T16:43:59Z | en_US |
dc.description.abstract | Abstract In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases. | en_US |
dc.identifier.citation | Nasko, Daniel J, Koren, Sergey, Phillippy, Adam M, et al.. "RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification." (2018) BioMed Central: https://doi.org/10.1186/s13059-018-1554-6. | en_US |
dc.identifier.doi | https://doi.org/10.1186/s13059-018-1554-6 | en_US |
dc.identifier.uri | https://hdl.handle.net/1911/103430 | en_US |
dc.language.iso | eng | en_US |
dc.publisher | BioMed Central | en_US |
dc.rights | This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. | en_US |
dc.rights.holder | The Author(s). | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
dc.title | RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification | en_US |
dc.type | Journal article | en_US |
dc.type.dcmi | Text | en_US |
dc.type.publication | publisher version | en_US |