Adapting learning and search algorithms to handle protein structural data with the goal of aiding drug discovery

Date
2024-09-16
Journal Title
Journal ISSN
Volume Title
Publisher
Embargo
Abstract

Experimental methods for protein structure determination (e.g., x-ray crystallography, NMR, cryoEM) require access to expensive equipment and are not scalable. Computational methods assist protein structure prediction and analysis on a far larger scale. Recent deep learning advances, the most notable being DeepMind’s AlphaFold2.0 release in 2021, have provided a wealth of structural data for further analysis and open new opportunities for algorithmic development. In my work, I address three different tasks that make use of the available protein structure data: (1) system-specific binding-affinity prediction (in the context of the immune-related peptide-HLA system); (2) generation of representative ensembles from generic protein structure datasets; (3) protein-ligand ensemble docking. To this end, I examine and adapt a range of algorithms including random forest regression models, unsupervised learning methods and stochastic global optimization techniques. I validate the resulting pipelines on available experimental data and apply them to different macromolecular contexts such as the immune-related formation of the peptide-HLA complex; flexibility of the signal transducer PI3K lipid kinase; CDK2 protein kinase and estrogen receptor α. Developed pipelines are open source and freely available and can help guide the search for novel therapeutics.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Protein structure, machine learning, unsupervised learning, peptide-HLA, molecular docking
Citation
Has part(s)
Forms part of
Published Version
Rights
Link to license
Citable link to this page