Computer Science Publications

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 154
  • Item
    A deep learning solution for crystallographic structure determination
    (International Union of Crystallography, 2023) Pan, T.; Jin, S.; Miller, M. D.; Kyrillidis, A.; Phillips, G. N.
    The general de novo solution of the crystallographic phase problem is difficult and only possible under certain conditions. This paper develops an initial pathway to a deep learning neural network approach for the phase problem in protein crystallography, based on a synthetic dataset of small fragments derived from a large well curated subset of solved structures in the Protein Data Bank (PDB). In particular, electron-density estimates of simple artificial systems are produced directly from corresponding Patterson maps using a convolutional neural network architecture as a proof of concept.
  • Item
    PME: pruning-based multi-size embedding for recommender systems
    (Frontiers Media S.A., 2023) Liu, Zirui; Song, Qingquan; Li, Li; Choi, Soo-Hyun; Chen, Rui; Hu, Xia
    Embedding is widely used in recommendation models to learn feature representations. However, the traditional embedding technique that assigns a fixed size to all categorical features may be suboptimal due to the following reasons. In recommendation domain, the majority of categorical features' embeddings can be trained with less capacity without impacting model performance, thereby storing embeddings with equal length may incur unnecessary memory usage. Existing work that tries to allocate customized sizes for each feature usually either simply scales the embedding size with feature's popularity or formulates this size allocation problem as an architecture selection problem. Unfortunately, most of these methods either have large performance drop or incur significant extra time cost for searching proper embedding sizes. In this article, instead of formulating the size allocation problem as an architecture selection problem, we approach the problem from a pruning perspective and propose Pruning-based Multi-size Embedding (PME) framework. During the search phase, we prune the dimensions that have the least impact on model performance in the embedding to reduce its capacity. Then, we show that the customized size of each token can be obtained by transferring the capacity of its pruned embedding with significant less search cost. Experimental results validate that PME can efficiently find proper sizes and hence achieve strong performance while significantly reducing the number of parameters in the embedding layer.
  • Item
    EnGens: a computational framework for generation and analysis of representative protein conformational ensembles
    (Oxford University Press, 2023) Conev, Anja; Rigo, Mauricio Menegatti; Devaurs, Didier; Fonseca, André Faustino; Kalavadwala, Hussain; de Freitas, Martiela Vaz; Clementi, Cecilia; Zanatta, Geancarlo; Antunes, Dinler Amaral; Kavraki, Lydia E
    Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein–ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
  • Item
    Enabling accurate and early detection of recently emerged SARS-CoV-2 variants of concern in wastewater
    (Springer Nature, 2023) Sapoval, Nicolae; Liu, Yunxi; Lou, Esther G.; Hopkins, Loren; Ensor, Katherine B.; Schneider, Rebecca; Stadler, Lauren B.; Treangen, Todd J.
    As clinical testing declines, wastewater monitoring can provide crucial surveillance on the emergence of SARS-CoV-2 variant of concerns (VoCs) in communities. In this paper we present QuaID, a novel bioinformatics tool for VoC detection based on quasi-unique mutations. The benefits of QuaID are three-fold: (i) provides up to 3-week earlier VoC detection, (ii) accurate VoC detection (>95% precision on simulated benchmarks), and (iii) leverages all mutational signatures (including insertions & deletions).
  • Item
    PepSim: T-cell cross-reactivity prediction via comparison of peptide sequence and peptide-HLA structure
    (Frontiers Media S.A., 2023) Hall-Swan, Sarah; Slone, Jared; Rigo, Mauricio M.; Antunes, Dinler A.; Lizée, Gregory; Kavraki, Lydia E.
    IntroductionPeptide-HLA class I (pHLA) complexes on the surface of tumor cells can be targeted by cytotoxic T-cells to eliminate tumors, and this is one of the bases for T-cell-based immunotherapies. However, there exist cases where therapeutic T-cells directed towards tumor pHLA complexes may also recognize pHLAs from healthy normal cells. The process where the same T-cell clone recognizes more than one pHLA is referred to as T-cell cross-reactivity and this process is driven mainly by features that make pHLAs similar to each other. T-cell cross-reactivity prediction is critical for designing T-cell-based cancer immunotherapies that are both effective and safe.MethodsHere we present PepSim, a novel score to predict T-cell cross-reactivity based on the structural and biochemical similarity of pHLAs.Results and discussionWe show our method can accurately separate cross-reactive from non-crossreactive pHLAs in a diverse set of datasets including cancer, viral, and self-peptides. PepSim can be generalized to work on any dataset of class I peptide-HLAs and is freely available as a web server at
  • Item
    Improved understanding of biorisk for research involving microbial modification using annotated sequences of concern
    (Frontiers Media S.A., 2023) Godbold, Gene D.; Hewitt, F. Curtis; Kappell, Anthony D.; Scholz, Matthew B.; Agar, Stacy L.; Treangen, Todd J.; Ternus, Krista L.; Sandbrink, Jonas B.; Koblentz, Gregory D.
    Regulation of research on microbes that cause disease in humans has historically been focused on taxonomic lists of ‘bad bugs’. However, given our increased knowledge of these pathogens through inexpensive genome sequencing, 5 decades of research in microbial pathogenesis, and the burgeoning capacity of synthetic biologists, the limitations of this approach are apparent. With heightened scientific and public attention focused on biosafety and biosecurity, and an ongoing review by US authorities of dual-use research oversight, this article proposes the incorporation of sequences of concern (SoCs) into the biorisk management regime governing genetic engineering of pathogens. SoCs enable pathogenesis in all microbes infecting hosts that are ‘of concern’ to human civilization. Here we review the functions of SoCs (FunSoCs) and discuss how they might bring clarity to potentially problematic research outcomes involving infectious agents. We believe that annotation of SoCs with FunSoCs has the potential to improve the likelihood that dual use research of concern is recognized by both scientists and regulators before it occurs.
  • Item
    Genome-Wide Analysis of Structural Variants in Parkinson Disease
    (Wiley, 2023) Billingsley, Kimberley J.; Ding, Jinhui; Jerez, Pilar Alvarez; Illarionova, Anastasia; Levine, Kristin; Grenn, Francis P.; Makarious, Mary B.; Moore, Anni; Vitale, Daniel; Reed, Xylena; Hernandez, Dena; Torkamani, Ali; Ryten, Mina; Hardy, John; Consortium (UKBEC), UK Brain Expression; Chia, Ruth; Scholz, Sonja W.; Traynor, Bryan J.; Dalgard, Clifton L.; Ehrlich, Debra J.; Tanaka, Toshiko; Ferrucci, Luigi; Beach, Thomas G.; Serrano, Geidy E.; Quinn, John P.; Bubb, Vivien J.; Collins, Ryan L; Zhao, Xuefang; Walker, Mark; Pierce-Hoffman, Emma; Brand, Harrison; Talkowski, Michael E.; Casey, Bradford; Cookson, Mark R; Markham, Androo; Nalls, Mike A.; Mahmoud, Medhat; Sedlazeck, Fritz J; Blauwendraat, Cornelis; Gibbs, J. Raphael; Singleton, Andrew B.
    Objective Identification of genetic risk factors for Parkinson disease (PD) has to date been primarily limited to the study of single nucleotide variants, which only represent a small fraction of the genetic variation in the human genome. Consequently, causal variants for most PD risk are not known. Here we focused on structural variants (SVs), which represent a major source of genetic variation in the human genome. We aimed to discover SVs associated with PD risk by performing the first large-scale characterization of SVs in PD. Methods We leveraged a recently developed computational pipeline to detect and genotype SVs from 7,772 Illumina short-read whole genome sequencing samples. Using this set of SV variants, we performed a genome-wide association study using 2,585 cases and 2,779 controls and identified SVs associated with PD risk. Furthermore, to validate the presence of these variants, we generated a subset of matched whole-genome long-read sequencing data. Results We genotyped and tested 3,154 common SVs, representing over 412 million nucleotides of previously uncatalogued genetic variation. Using long-read sequencing data, we validated the presence of three novel deletion SVs that are associated with risk of PD from our initial association analysis, including a 2 kb intronic deletion within the gene LRRN4. Interpretation We identified three SVs associated with genetic risk of PD. This study represents the most comprehensive assessment of the contribution of SVs to the genetic risk of PD to date. ANN NEUROL 2023;93:1012–1022
  • Item
    Intratumoral Heterogeneity and Clonal Evolution Induced by HPV Integration
    (AACR, 2023) Akagi, Keiko; Symer, David E.; Mahmoud, Medhat; Jiang, Bo; Goodwin, Sara; Wangsa, Darawalee; Li, Zhengke; Xiao, Weihong; Dan Dunn, Joe; Ried, Thomas; Coombes, Kevin R.; Sedlazeck, Fritz J.; Gillison, Maura L.
    The human papillomavirus (HPV) genome is integrated into host DNA in most HPV-positive cancers, but the consequences for chromosomal integrity are unknown. Continuous long-read sequencing of oropharyngeal cancers and cancer cell lines identified a previously undescribed form of structural variation, “heterocateny,” characterized by diverse, interrelated, and repetitive patterns of concatemerized virus and host DNA segments within a cancer. Unique breakpoints shared across structural variants facilitated stepwise reconstruction of their evolution from a common molecular ancestor. This analysis revealed that virus and virus–host concatemers are unstable and, upon insertion into and excision from chromosomes, facilitate capture, amplification, and recombination of host DNA and chromosomal rearrangements. Evidence of heterocateny was detected in extrachromosomal and intrachromosomal DNA. These findings indicate that heterocateny is driven by the dynamic, aberrant replication and recombination of an oncogenic DNA virus, thereby extending known consequences of HPV integration to include promotion of intratumoral heterogeneity and clonal evolution.Long-read sequencing of HPV-positive cancers revealed “heterocateny,” a previously unreported form of genomic structural variation characterized by heterogeneous, interrelated, and repetitive genomic rearrangements within a tumor. Heterocateny is driven by unstable concatemerized HPV genomes, which facilitate capture, rearrangement, and amplification of host DNA, and promotes intratumoral heterogeneity and clonal evolution.See related commentary by McBride and White, p. 814.This article is highlighted in the In This Issue feature, p. 799
  • Item
    A Chromosome-length Assembly of the Black Petaltail (Tanypteryx hageni) Dragonfly
    (Oxford University Press, 2023) Tolman, Ethan R; Beatty, Christopher D; Bush, Jonas; Kohli, Manpreet; Moreno, Carlos M; Ware, Jessica L; Weber, K Scott; Khan, Ruqayya; Maheshwari, Chirag; Weisz, David; Dudchenko, Olga; Aiden, Erez Lieberman; Frandsen, Paul B; Center for Theoretical Biological Physics
    We present a chromosome-length genome assembly and annotation of the Black Petaltail dragonfly (Tanypteryx hageni). This habitat specialist diverged from its sister species over 70 million years ago, and separated from the most closely related Odonata with a reference genome 150 million years ago. Using PacBio HiFi reads and Hi-C data for scaffolding we produce one of the most high-quality Odonata genomes to date. A scaffold N50 of 206.6 Mb and a single copy BUSCO score of 96.2% indicate high contiguity and completeness.
  • Item
    FixItFelix: improving genomic analysis by fixing reference errors
    (Springer Nature, 2023) Behera, Sairam; LeFaive, Jonathon; Orchard, Peter; Mahmoud, Medhat; Paulin, Luis F.; Farek, Jesse; Soto, Daniela C.; Parker, Stephen C. J.; Smith, Albert V.; Dennis, Megan Y.; Zook, Justin M.; Sedlazeck, Fritz J.
    The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant calling of 33 protein-coding genes, including 12 with medical relevance. Here, we present FixItFelix, an efficient remapping approach, together with a modified version of the GRCh38 reference genome that improves the subsequent analysis across these genes within minutes for an existing alignment file while maintaining the same coordinates. We showcase these improvements over multi-ethnic control samples, demonstrating improvements for population variant calling as well as eQTL studies.
  • Item
    Fast Quantum State Reconstruction via Accelerated Non-Convex Programming
    (MDPI, 2023) Kim, Junhyung Lyle; Kollias, George; Kalev, Amir; Wei, Ken X.; Kyrillidis, Anastasios
    We propose a new quantum state reconstruction method that combines ideas from compressed sensing, non-convex optimization, and acceleration methods. The algorithm, called Momentum-Inspired Factored Gradient Descent (MiFGD), extends the applicability of quantum tomography for larger systems. Despite being a non-convex method, MiFGD converges provably close to the true density matrix at an accelerated linear rate asymptotically in the absence of experimental and statistical noise, under common assumptions. With this manuscript, we present the method, prove its convergence property and provide the Frobenius norm bound guarantees with respect to the true density matrix. From a practical point of view, we benchmark the algorithm performance with respect to other existing methods, in both synthetic and real (noisy) experiments, performed on the IBM’s quantum processing unit. We find that the proposed algorithm performs orders of magnitude faster than the state-of-the-art approaches, with similar or better accuracy. In both synthetic and real experiments, we observed accurate and robust reconstruction, despite the presence of experimental and statistical noise in the tomographic data. Finally, we provide a ready-to-use code for state tomography of multi-qubit systems.
  • Item
    The swan genome and transcriptome, it is not all black and white
    (Springer Nature, 2023) Karawita, Anjana C.; Cheng, Yuanyuan; Chew, Keng Yih; Challagulla, Arjun; Kraus, Robert; Mueller, Ralf C.; Tong, Marcus Z. W.; Hulme, Katina D.; Bielefeldt-Ohmann, Helle; Steele, Lauren E.; Wu, Melanie; Sng, Julian; Noye, Ellesandra; Bruxner, Timothy J.; Au, Gough G.; Lowther, Suzanne; Blommaert, Julie; Suh, Alexander; McCauley, Alexander J.; Kaur, Parwinder; Dudchenko, Olga; Aiden, Erez; Fedrigo, Olivier; Formenti, Giulio; Mountcastle, Jacquelyn; Chow, William; Martin, Fergal J.; Ogeh, Denye N.; Thiaud-Nissen, Françoise; Howe, Kerstin; Tracey, Alan; Smith, Jacqueline; Kuo, Richard I.; Renfree, Marilyn B.; Kimura, Takashi; Sakoda, Yoshihiro; McDougall, Mathew; Spencer, Hamish G.; Pyne, Michael; Tolf, Conny; Waldenström, Jonas; Jarvis, Erich D.; Baker, Michelle L.; Burt, David W.; Short, Kirsty R.; Centre for Theoretical Biological Physics
    Background: The Australian black swan (Cygnus atratus) is an iconic species with contrasting plumage to that of the closely related northern hemisphere white swans. The relative geographic isolation of the black swan may have resulted in a limited immune repertoire and increased susceptibility to infectious diseases, notably infectious diseases from which Australia has been largely shielded. Unlike mallard ducks and the mute swan (Cygnus olor), the black swan is extremely sensitive to highly pathogenic avian influenza. Understanding this susceptibility has been impaired by the absence of any available swan genome and transcriptome information. Results: Here, we generate the first chromosome-length black and mute swan genomes annotated with transcriptome data, all using long-read based pipelines generated for vertebrate species. We use these genomes and transcriptomes to show that unlike other wild waterfowl, black swans lack an expanded immune gene repertoire, lack a key viral pattern-recognition receptor in endothelial cells and mount a poorly controlled inflammatory response to highly pathogenic avian influenza. We also implicate genetic differences in SLC45A2 gene in the iconic plumage of the black swan. Conclusion: Together, these data suggest that the immune system of the black swan is such that should any avian viral infection become established in its native habitat, the black swan would be in a significant peril.
  • Item
    Streaming Quantiles Algorithms with Small Space and Update Time
    (MDPI, 2022) Ivkin, Nikita; Liberty, Edo; Lang, Kevin; Karnin, Zohar; Braverman, Vladimir
    Approximating quantiles and distributions over streaming data has been studied for roughly two decades now. Recently, Karnin, Lang, and Liberty proposed the first asymptotically optimal algorithm for doing so. This manuscript complements their theoretical result by providing a practical variants of their algorithm with improved constants. For a given sketch size, our techniques provably reduce the upper bound on the sketch error by a factor of two. These improvements are verified experimentally. Our modified quantile sketch improves the latency as well by reducing the worst-case update time from O(1ε) down to O(log1ε).
  • Item
    Analysis of bronchoalveolar lavage fluid metatranscriptomes among patients with COVID-19 disease
    (Springer Nature, 2022) Jochum, Michael; Lee, Michael D.; Curry, Kristen; Zaksas, Victoria; Vitalis, Elizabeth; Treangen, Todd; Aagaard, Kjersti; Ternus, Krista L.
    To better understand the potential relationship between COVID-19 disease and hologenome microbial community dynamics and functional profiles, we conducted a multivariate taxonomic and functional microbiome comparison of publicly available human bronchoalveolar lavage fluid (BALF) metatranscriptome samples amongst COVID-19 (n = 32), community acquired pneumonia (CAP) (n = 25), and uninfected samples (n = 29). We then performed a stratified analysis based on mortality amongst the COVID-19 cohort with known outcomes of deceased (n = 10) versus survived (n = 15). Our overarching hypothesis was that there are detectable and functionally significant relationships between BALF microbial metatranscriptomes and the severity of COVID-19 disease onset and progression. We observed 34 functionally discriminant gene ontology (GO) terms in COVID-19 disease compared to the CAP and uninfected cohorts, and 21 GO terms functionally discriminant to COVID-19 mortality (q < 0.05). GO terms enriched in the COVID-19 disease cohort included hydrolase activity, and significant GO terms under the parental terms of biological regulation, viral process, and interspecies interaction between organisms. Notable GO terms associated with COVID-19 mortality included nucleobase-containing compound biosynthetic process, organonitrogen compound catabolic process, pyrimidine-containing compound biosynthetic process, and DNA recombination, RNA binding, magnesium and zinc ion binding, oxidoreductase activity, and endopeptidase activity. A Dirichlet multinomial mixtures clustering analysis resulted in a best model fit using three distinct clusters that were significantly associated with COVID-19 disease and mortality. We additionally observed discriminant taxonomic differences associated with COVID-19 disease and mortality in the genus Sphingomonas, belonging to the Sphingomonadacae family, Variovorax, belonging to the Comamonadaceae family, and in the class Bacteroidia, belonging to the order Bacteroidales. To our knowledge, this is the first study to evaluate significant differences in taxonomic and functional signatures between BALF metatranscriptomes from COVID-19, CAP, and uninfected cohorts, as well as associating these taxa and microbial gene functions with COVID-19 mortality. Collectively, while this data does not speak to causality nor directionality of the association, it does demonstrate a significant relationship between the human microbiome and COVID-19. The results from this study have rendered testable hypotheses that warrant further investigation to better understand the causality and directionality of host–microbiome–pathogen interactions.
  • Item
    Auto-GNN: Neural architecture search of graph neural networks
    (Frontiers Media S.A., 2022) Zhou, Kaixiong; Huang, Xiao; Song, Qingquan; Chen, Rui; Hu, Xia; DATA Lab
    Graph neural networks (GNNs) have been widely used in various graph analysis tasks. As the graph characteristics vary significantly in real-world systems, given a specific scenario, the architecture parameters need to be tuned carefully to identify a suitable GNN. Neural architecture search (NAS) has shown its potential in discovering the effective architectures for the learning tasks in image and language modeling. However, the existing NAS algorithms cannot be applied efficiently to GNN search problem because of two facts. First, the large-step exploration in the traditional controller fails to learn the sensitive performance variations with slight architecture modifications in GNNs. Second, the search space is composed of heterogeneous GNNs, which prevents the direct adoption of parameter sharing among them to accelerate the search progress. To tackle the challenges, we propose an automated graph neural networks (AGNN) framework, which aims to find the optimal GNN architecture efficiently. Specifically, a reinforced conservative controller is designed to explore the architecture space with small steps. To accelerate the validation, a novel constrained parameter sharing strategy is presented to regularize the weight transferring among GNNs. It avoids training from scratch and saves the computation time. Experimental results on the benchmark datasets demonstrate that the architecture identified by AGNN achieves the best performance and search efficiency, comparing with existing human-invented models and the traditional search methods.
  • Item
    De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee
    (Springer Nature, 2022) Liu, Yunxi; Elworth, R. A. Leo; Jochum, Michael D.; Aagaard, Kjersti M.; Treangen, Todd J.
    Computational analysis of host-associated microbiomes has opened the door to numerous discoveries relevant to human health and disease. However, contaminant sequences in metagenomic samples can potentially impact the interpretation of findings reported in microbiome studies, especially in low-biomass environments. Contamination from DNA extraction kits or sampling lab environments leaves taxonomic "bread crumbs" across multiple distinct sample types. Here we describe Squeegee, a de novo contamination detection tool that is based upon this principle, allowing the detection of microbial contaminants when negative controls are unavailable. On the low-biomass samples, we compare Squeegee predictions to experimental negative control data and show that Squeegee accurately recovers putative contaminants. We analyze samples of varying biomass from the Human Microbiome Project and identify likely, previously unreported kit contamination. Collectively, our results highlight that Squeegee can identify microbial contaminants with high precision and thus represents a computational approach for contaminant detection when negative controls are unavailable.
  • Item
    Systematic Analysis of Mobile Genetic Elements Mediating β-Lactamase Gene Amplification in Noncarbapenemase-Producing Carbapenem-Resistant Enterobacterales Bloodstream Infections
    (American Society for Microbiology, 2022) Shropshire, W.C.; Konovalova, A.; McDaneld, P.; Gohel, M.; Strope, B.; Sahasrabhojane, P.; Tran, C.N.; Greenberg, D.; Kim, J.; Zhan, X.; Aitken, S.; Bhatti, M.; Savidge, T.C.; Treangen, T.J.; Hanson, B.M.; Arias, C.A.; Shelburne, S.A.
    Noncarbapenemase-producing carbapenem-resistant Enterobacterales (non-CP-CRE) are increasingly recognized as important contributors to prevalent carbapenem-resistant Enterobacterales (CRE) infections. However, there is limited understanding of mechanisms underlying non-CP-CRE causing invasive disease. Long- and short-read whole-genome sequencing was used to elucidate carbapenem nonsusceptibility determinants in Enterobacterales bloodstream isolates at MD Anderson Cancer Center in Houston, Texas. We investigated carbapenem nonsusceptible Enterobacterales (CNSE) mechanisms (i.e., isolates with carbapenem intermediate resistance phenotypes or greater) through a combination of phylogenetic analysis, antimicrobial resistance gene detection/copy number quantification, porin assessment, and mobile genetic element (MGE) characterization. Most CNSE isolates sequenced were non-CP-CRE (41/79; 51.9%), whereas 25.3% (20/79) were Enterobacterales with intermediate susceptibility to carbapenems (CIE), and 22.8% (18/79) were carbapenemase-producing Enterobacterales (CPE). Statistically significant copy number variants (CNVs) of extended-spectrum β-lactamase (ESBL) genes (Wilcoxon Test; P-value < 0.001) were present in both non-CP-CR E. coli (median CNV = 2.6×; n = 17) and K. pneumoniae (median CNV = 3.2×, n = 17). All non-CP-CR E. coli and K. pneumoniae had predicted reduced expression of at least one outer membrane porin gene (i.e., ompC/ompF or ompK36/ompK35). Completely resolved CNSE genomes revealed that IS26 and ISEcp1 structures harboring blaCTX-M variants along with other antimicrobial resistance elements were associated with gene amplification, occurring in mostly IncFIB/IncFII plasmid contexts. MGE-mediated β-lactamase gene amplifications resulted in either tandem arrays, primarily mediated by IS26 translocatable units, or segmental duplication, typically due to ISEcp1 transposition units. Non-CP-CRE strains were the most common cause of CRE bacteremia with carbapenem nonsusceptibility driven by concurrent porin loss and MGE-mediated amplification of blaCTX-M genes. IMPORTANCE Carbapenem-resistant Enterobacterales (CRE) are considered urgent antimicrobial resistance (AMR) threats. The vast majority of CRE research has focused on carbapenemase-producing Enterobacterales (CPE) even though noncarbapenemase-producing CRE (non-CP-CRE) comprise 50% or more of isolates in some surveillance studies. Thus, carbapenem resistance mechanisms in non-CP-CRE remain poorly characterized. To address this problem, we applied a combination of short- and long-read sequencing technologies to a cohort of CRE bacteremia isolates and used these data to unravel complex mobile genetic element structures mediating β-lactamase gene amplification. By generating complete genomes of 65 carbapenem nonsusceptible Enterobacterales (CNSE) covering a genetically diverse array of isolates, our findings both generate novel insights into how non-CP-CRE overcome carbapenem treatments and provide researchers scaffolds for characterization of their own non-CP-CRE isolates. Improved recognition of mechanisms driving development of non-CP-CRE could assist with design and implementation of future strategies to mitigate the impact of these increasingly recognized AMR pathogens.
  • Item
    Multiple genome alignment in the telomere-to-telomere assembly era
    (Springer Nature, 2022) Kille, Bryce; Balaji, Advait; Sedlazeck, Fritz J.; Nute, Michael; Treangen, Todd J.
    With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.
  • Item
    Infectious Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in Exhaled Aerosols and Efficacy of Masks During Early Mild Infection
    (Oxford University Press, 2022) Adenaiye, Oluwasanmi O.; Lai, Jianyu; Bueno de Mesquita, P. Jacob; Hong, Filbert; Youssefi, Somayeh; German, Jennifer; Tai, S.H. Sheldon; Albert, Barbara; Schanz, Maria; Weston, Stuart; Hang, Jun; Fung, Christian; Chung, Hye Kyung; Coleman, Kristen K.; Sapoval, Nicolae; Treangen, Todd; Berry, Irina Maljkovic; Mullins, Kristin; Frieman, Matthew; Ma, Tianzhou; Milton, Donald K.; University of Maryland StopCOVID Research Group
    Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemiology implicates airborne transmission; aerosol infectiousness and impacts of masks and variants on aerosol shedding are not well understood.We recruited coronavirus disease 2019 (COVID-19) cases to give blood, saliva, mid-turbinate and fomite (phone) swabs, and 30-minute breath samples while vocalizing into a Gesundheit-II, with and without masks at up to 2 visits 2 days apart. We quantified and sequenced viral RNA, cultured virus, and assayed serum samples for anti-spike and anti-receptor binding domain antibodies.We enrolled 49 seronegative cases (mean days post onset 3.8 ± 2.1), May 2020 through April 2021. We detected SARS-CoV-2 RNA in 36% of fine (≤5 µm), 26% of coarse (>5 µm) aerosols, and 52% of fomite samples overall and in all samples from 4 alpha variant cases. Masks reduced viral RNA by 48% (95% confidence interval [CI], 3 to 72%) in fine and by 77% (95% CI, 51 to 89%) in coarse aerosols; cloth and surgical masks were not significantly different. The alpha variant was associated with a 43-fold (95% CI, 6.6- to 280-fold) increase in fine aerosol viral RNA, compared with earlier viruses, that remained a significant 18-fold (95% CI, 3.4- to 92-fold) increase adjusting for viral RNA in saliva, swabs, and other potential confounders. Two fine aerosol samples, collected while participants wore masks, were culture-positive.SARS-CoV-2 is evolving toward more efficient aerosol generation and loose-fitting masks provide significant but only modest source control. Therefore, until vaccination rates are very high, continued layered controls and tight-fitting masks and respirators will be necessary.
  • Item
    Accelerating High-Order Stencils on GPUs
    (IEEE, 2020) Sai, Ryuichi; Mellor-Crummey, John; Meng, Xiaozhu; Araya-Polo, Mauricio; Meng, Jie
    While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of the techniques work well for high-order stencils, such as those used for seismic imaging. In this paper, we study practical seismic imaging computations on GPUs using high-order stencils on large domains with meaningful boundary conditions. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA along with code to apply the boundary conditions. We evaluated our stencil code shapes, memory hierarchy usage, data-fetching patterns, and other performance attributes. We conducted an empirical evaluation of these stencils using several mature and emerging tools and discuss our quantitative findings. Some of our implementations achieved twice the performance of a proprietary code developed in C and mapped to GPUs using OpenACC. Additionally, several of our implementations have excellent performance portability.