Elucidating Metabolism through Machine Learning

dc.contributor.advisorKavraki, Lydia E.en_US
dc.creatorLitsa, Eleni E.en_US
dc.date.accessioned2021-05-03T21:33:53Zen_US
dc.date.available2022-05-01T05:01:13Zen_US
dc.date.created2021-05en_US
dc.date.issued2021-04-23en_US
dc.date.submittedMay 2021en_US
dc.date.updated2021-05-03T21:33:53Zen_US
dc.description.abstractMetabolism consists of all chemical reactions that take place in an organism to sustain life. Metabolic studies have the potential to advance chemical synthesis and drug development, discovery of biomarkers and therapeutic targets, as well as, environmental management. Computational tools can greatly benefit metabolic studies as the standard experimental practices are often laborious and resource demanding. Existing computational approaches often rely on expert knowledge limiting scalability and generalizability. As the volume of the available metabolic data grows, Machine Learning (ML) is emerging as a promising tool to assist metabolic studies. The latest advancements in the field of ML and Deep Learning (DL), especially regarding structured data such as chemical molecules, are also pointing to the same direction. Metabolic data though are very scarce, as opposed to general chemical data, making the application of ML especially challenging. In this work, we have explored statistical ML methodologies as well as DL architectures. In the latter case, we have explored the use of Transfer Learning in an attempt to circumvent the limited data problem and also take advantage of the massive datasets on general chemical data. More specifically, we have developed ML-based approaches for three different problems to assist metabolic studies: The first problem is to automatically identify the reaction mechanism of a metabolic reaction in the form of an atom mapping between the atoms in the two sides of the reaction. This problem is approached as a graph matching problem, representing chemical molecules as graphs, which is solved using optimization algorithms. Our approach improved upon existing methodologies by incorporating chemical knowledge into the graph problem using statistical ML. The second problem is to predict human metabolites of chemical molecules such as drugs. We approached this problem as a sequence translation problem representing chemical molecules as sequences based on a standard sequence notation called SMILES. We used a neural Machine Translation algorithm to translate the sequence of the molecule into the metabolites that may be formed in the human body. Our end-to-end learning approach exhibits better scalability and generalizability as compared to previous rule-based methodologies. Finally, the third problem is to recommend chemical structures given mass spectrometry data in order to assist structure elucidation in metabolomics studies. We approached this problem as a signal translation problem where the signal that is recorded from the mass spectrometer is translated into the SMILES sequence of the chemical molecule using a DL architecture. Our approach is the first one that has the potential to aid the elucidation of even novel molecules whose structures are not known yet. Overall our work has demonstrated the potential of ML and DL to assist metabolic studies as well as the importance of Transfer Learning in domains with limited available data.en_US
dc.embargo.terms2022-05-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLitsa, Eleni E.. "Elucidating Metabolism through Machine Learning." (2021) Diss., Rice University. <a href="https://hdl.handle.net/1911/110428">https://hdl.handle.net/1911/110428</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/110428en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectmachine learningen_US
dc.subjectdeep learningen_US
dc.subjecttransfer learningen_US
dc.subjectmetabolismen_US
dc.subjectmetabolic reactionsen_US
dc.subjectatom mappingen_US
dc.subjectmetabolite predictionen_US
dc.subjectmass spectrometryen_US
dc.subjectstructure elucidationen_US
dc.titleElucidating Metabolism through Machine Learningen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LITSA-DOCUMENT-2021.pdf
Size:
3.75 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: