Browsing by Author "Conev, Anja"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Embargo Adapting learning and search algorithms to handle protein structural data with the goal of aiding drug discovery(2024-09-16) Conev, Anja; Kavraki, Lydia EExperimental methods for protein structure determination (e.g., x-ray crystallography, NMR, cryoEM) require access to expensive equipment and are not scalable. Computational methods assist protein structure prediction and analysis on a far larger scale. Recent deep learning advances, the most notable being DeepMind’s AlphaFold2.0 release in 2021, have provided a wealth of structural data for further analysis and open new opportunities for algorithmic development. In my work, I address three different tasks that make use of the available protein structure data: (1) system-specific binding-affinity prediction (in the context of the immune-related peptide-HLA system); (2) generation of representative ensembles from generic protein structure datasets; (3) protein-ligand ensemble docking. To this end, I examine and adapt a range of algorithms including random forest regression models, unsupervised learning methods and stochastic global optimization techniques. I validate the resulting pipelines on available experimental data and apply them to different macromolecular contexts such as the immune-related formation of the peptide-HLA complex; flexibility of the signal transducer PI3K lipid kinase; CDK2 protein kinase and estrogen receptor α. Developed pipelines are open source and freely available and can help guide the search for novel therapeutics.Item EnGens: a computational framework for generation and analysis of representative protein conformational ensembles(Oxford University Press, 2023) Conev, Anja; Rigo, Mauricio Menegatti; Devaurs, Didier; Fonseca, André Faustino; Kalavadwala, Hussain; de Freitas, Martiela Vaz; Clementi, Cecilia; Zanatta, Geancarlo; Antunes, Dinler Amaral; Kavraki, Lydia EProteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein–ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.Item HLA-Arena: A Customizable Environment for the Structural Modeling and Analysis of Peptide-HLA Complexes for Cancer Immunotherapy(ASCO, 2020) Antunes, Dinler A.; Abella, Jayvee R.; Hall-Swan, Sarah; Devaurs, Didier; Conev, Anja; Moll, Mark; Lizée, Gregory; Kavraki, Lydia E.PURPOSE: HLA protein receptors play a key role in cellular immunity. They bind intracellular peptides and display them for recognition by T-cell lymphocytes. Because T-cell activation is partially driven by structural features of these peptide-HLA complexes, their structural modeling and analysis are becoming central components of cancer immunotherapy projects. Unfortunately, this kind of analysis is limited by the small number of experimentally determined structures of peptide-HLA complexes. Overcoming this limitation requires developing novel computational methods to model and analyze peptide-HLA structures. METHODS: Here we describe a new platform for the structural modeling and analysis of peptide-HLA complexes, called HLA-Arena, which we have implemented using Jupyter Notebook and Docker. It is a customizable environment that facilitates the use of computational tools, such as APE-Gen and DINC, which we have previously applied to peptide-HLA complexes. By integrating other commonly used tools, such as MODELLER and MHCflurry, this environment includes support for diverse tasks in structural modeling, analysis, and visualization. RESULTS: To illustrate the capabilities of HLA-Arena, we describe 3 example workflows applied to peptide-HLA complexes. Leveraging the strengths of our tools, DINC and APE-Gen, the first 2 workflows show how to perform geometry prediction for peptide-HLA complexes and structure-based binding prediction, respectively. The third workflow presents an example of large-scale virtual screening of peptides for multiple HLA alleles. CONCLUSION: These workflows illustrate the potential benefits of HLA-Arena for the structural modeling and analysis of peptide-HLA complexes. Because HLA-Arena can easily be integrated within larger computational pipelines, we expect its potential impact to vastly increase. For instance, it could be used to conduct structural analyses for personalized cancer immunotherapy, neoantigen discovery, or vaccine development.Item Machine Learning-Guided Three-Dimensional Printing of Tissue Engineering Scaffolds(Mary Ann Liebert, Inc., 2020) Conev, Anja; Litsa, Eleni E.; Perez, Marissa R.; Diba, Mani; Mikos, Antonios G.; Kavraki, Lydia E.; Bioengineering; Computer Science; Center for Engineering Complex TissuesVarious material compositions have been successfully used in 3D printing with promising applications as scaffolds in tissue engineering. However, identifying suitable printing conditions for new materials requires extensive experimentation in a time and resource-demanding process. This study investigates the use of Machine Learning (ML) for distinguishing between printing configurations that are likely to result in low-quality prints and printing configurations that are more promising as a first step toward the development of a recommendation system for identifying suitable printing conditions. The ML-based framework takes as input the printing conditions regarding the material composition and the printing parameters and predicts the quality of the resulting print as either “low” or “high.” We investigate two ML-based approaches: a direct classification-based approach that trains a classifier to distinguish between low- and high-quality prints and an indirect approach that uses a regression ML model that approximates the values of a printing quality metric. Both modes are built upon Random Forests. We trained and evaluated the models on a dataset that was generated in a previous study, which investigated fabrication of porous polymer scaffolds by means of extrusion-based 3D printing with a full-factorial design. Our results show that both models were able to correctly label the majority of the tested configurations while a simpler linear ML model was not effective. Additionally, our analysis showed that a full factorial design for data collection can lead to redundancies in the data, in the context of ML, and we propose a more efficient data collection strategy.Item SARS-Arena: Sequence and Structure-Guided Selection of Conserved Peptides from SARS-related Coronaviruses for Novel Vaccine Development(Frontiers Media S.A., 2022) Rigo, Mauricio Menegatti; Fasoulis, Romanos; Conev, Anja; Hall-Swan, Sarah; Antunes, Dinler Amaral; Kavraki, Lydia E.; Kavraki LabThe pandemic caused by the SARS-CoV-2 virus, the agent responsible for the COVID-19 disease, has affected millions of people worldwide. There is constant search for new therapies to either prevent or mitigate the disease. Fortunately, we have observed the successful development of multiple vaccines. Most of them are focused on one viral envelope protein, the spike protein. However, such focused approaches may contribute for the rise of new variants, fueled by the constant selection pressure on envelope proteins, and the widespread dispersion of coronaviruses in nature. Therefore, it is important to examine other proteins, preferentially those that are less susceptible to selection pressure, such as the nucleocapsid (N) protein. Even though the N protein is less accessible to humoral response, peptides from its conserved regions can be presented by class I Human Leukocyte Antigen (HLA) molecules, eliciting an immune response mediated by T-cells. Given the increased number of protein sequences deposited in biological databases daily and the N protein conservation among viral strains, computational methods can be leveraged to discover potential new targets for SARS-CoV-2 and SARS-CoV-related viruses. Here we developed SARS-Arena, a user-friendly computational pipeline that can be used by practitioners of different levels of expertise for novel vaccine development. SARS-Arena combines sequence-based methods and structure-based analyses to (i) perform multiple sequence alignment (MSA) of SARS-CoV-related N protein sequences, (ii) recover candidate peptides of different lengths from conserved protein regions, and (iii) model the 3D structure of the conserved peptides in the context of different HLAs. We present two main Jupyter Notebook workflows that can help in the identification of new T-cell targets against SARS-CoV viruses. In fact, in a cross-reactive case study, our workflows identified a conserved N protein peptide (SPRWYFYYL) recognized by CD8+ T-cells in the context of HLA-B7+. SARS-Arena is available at https://github.com/KavrakiLab/SARS-Arena.