Elucidating Metabolism through Machine Learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Metabolism consists of all chemical reactions that take place in an organism to sustain life. Metabolic studies have the potential to advance chemical synthesis and drug development, discovery of biomarkers and therapeutic targets, as well as, environmental management. Computational tools can greatly benefit metabolic studies as the standard experimental practices are often laborious and resource demanding. Existing computational approaches often rely on expert knowledge limiting scalability and generalizability. As the volume of the available metabolic data grows, Machine Learning (ML) is emerging as a promising tool to assist metabolic studies. The latest advancements in the field of ML and Deep Learning (DL), especially regarding structured data such as chemical molecules, are also pointing to the same direction. Metabolic data though are very scarce, as opposed to general chemical data, making the application of ML especially challenging. In this work, we have explored statistical ML methodologies as well as DL architectures. In the latter case, we have explored the use of Transfer Learning in an attempt to circumvent the limited data problem and also take advantage of the massive datasets on general chemical data. More specifically, we have developed ML-based approaches for three different problems to assist metabolic studies: The first problem is to automatically identify the reaction mechanism of a metabolic reaction in the form of an atom mapping between the atoms in the two sides of the reaction. This problem is approached as a graph matching problem, representing chemical molecules as graphs, which is solved using optimization algorithms. Our approach improved upon existing methodologies by incorporating chemical knowledge into the graph problem using statistical ML. The second problem is to predict human metabolites of chemical molecules such as drugs. We approached this problem as a sequence translation problem representing chemical molecules as sequences based on a standard sequence notation called SMILES. We used a neural Machine Translation algorithm to translate the sequence of the molecule into the metabolites that may be formed in the human body. Our end-to-end learning approach exhibits better scalability and generalizability as compared to previous rule-based methodologies. Finally, the third problem is to recommend chemical structures given mass spectrometry data in order to assist structure elucidation in metabolomics studies. We approached this problem as a signal translation problem where the signal that is recorded from the mass spectrometer is translated into the SMILES sequence of the chemical molecule using a DL architecture. Our approach is the first one that has the potential to aid the elucidation of even novel molecules whose structures are not known yet. Overall our work has demonstrated the potential of ML and DL to assist metabolic studies as well as the importance of Transfer Learning in domains with limited available data.
Description
Advisor
Degree
Type
Keywords
Citation
Litsa, Eleni E.. "Elucidating Metabolism through Machine Learning." (2021) Diss., Rice University. https://hdl.handle.net/1911/110428.