Annotation-free delineation of prokaryotic homology groups

dc.citation.articleNumbere1010216en_US
dc.citation.issueNumber6en_US
dc.citation.journalTitlePLOS Computational Biologyen_US
dc.citation.volumeNumber18en_US
dc.contributor.authorYin, Yongzeen_US
dc.contributor.authorOgilvie, Huw A.en_US
dc.contributor.authorNakhleh, Luayen_US
dc.date.accessioned2022-07-06T18:09:15Zen_US
dc.date.available2022-07-06T18:09:15Zen_US
dc.date.issued2022en_US
dc.description.abstractPhylogenomic studies of prokaryotic taxa often assume conserved marker genes are homologous across their length. However, processes such as horizontal gene transfer or gene duplication and loss may disrupt this homology by recombining only parts of genes, causing gene fission or fusion. We show using simulation that it is necessary to delineate homology groups in a set of bacterial genomes without relying on gene annotations to define the boundaries of homologous regions. To solve this problem, we have developed a graph-based algorithm to partition a set of bacterial genomes into Maximal Homologous Groups of sequences (MHGs) where each MHG is a maximal set of maximum-length sequences which are homologous across the entire sequence alignment. We applied our algorithm to a dataset of 19 Enterobacteriaceae species and found that MHGs cover much greater proportions of genomes than markers and, relatedly, are less biased in terms of the functions of the genes they cover. We zoomed in on the correlation between each individual marker and their overlapping MHGs, and show that few phylogenetic splits supported by the markers are supported by the MHGs while many marker-supported splits are contradicted by the MHGs. A comparison of the species tree inferred from marker genes with the species tree inferred from MHGs suggests that the increased bias and lack of genome coverage by markers causes incorrect inferences as to the overall relationship between bacterial taxa.en_US
dc.identifier.citationYin, Yongze, Ogilvie, Huw A. and Nakhleh, Luay. "Annotation-free delineation of prokaryotic homology groups." <i>PLOS Computational Biology,</i> 18, no. 6 (2022) Public Library of Science: https://doi.org/10.1371/journal.pcbi.1010216.en_US
dc.identifier.digitaljournal-pcbi-1010216en_US
dc.identifier.doihttps://doi.org/10.1371/journal.pcbi.1010216en_US
dc.identifier.urihttps://hdl.handle.net/1911/112679en_US
dc.language.isoengen_US
dc.publisherPublic Library of Scienceen_US
dc.rightsThis is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.titleAnnotation-free delineation of prokaryotic homology groupsen_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpublisher versionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
journal-pcbi-1010216.pdf
Size:
2.65 MB
Format:
Adobe Portable Document Format