Interpretable and Efficient Machine Learning in Cancer Biology

dc.contributor.advisorNakhleh, Luayen_US
dc.contributor.advisorChen, Kenen_US
dc.creatorLiang, Shaohengen_US
dc.date.accessioned2022-12-21T20:10:36Zen_US
dc.date.available2023-06-01T05:01:10Zen_US
dc.date.created2022-12en_US
dc.date.issued2022-12-01en_US
dc.date.submittedDecember 2022en_US
dc.date.updated2022-12-21T20:10:36Zen_US
dc.description.abstractThe past decade witnessed the advance of machine learning and cancer biology. In therapeutics, chimeric antigen receptor (CAR) treatments and cancer vaccines give new hope for ending cancer. Single-cell sequencing and mass spectrometry enable personalized high-resolution observations of cancer cell behavior and immune response. Computational cancer biology is no different; the continuous evolution of machine learning models, especially neural networks, provides unprecedented potential in making predictions. However, efforts are still needed to tailor the models to interpret specific biological processes. My research explores how knowledge-informed adaptation of machine learning techniques, such as neural networks, metric learning, and probabilistic classifiers helps answer questions in cancer biology. For example, periodicity in the cell cycle and other biological processes inspired our use of a sinusoidal activation function in an autoencoder to discover the periodicity in single-cell transcriptomic data. To efficiently predict biomarkers driving tumorigenesis and immune cell differentiation, we adapted UMAP with L1 regularization and our implementation of OWLQN (Orthant-Wise Limited-memory Quasi-Newton) optimizer. Inspired by structural motifs in antigen presentation, our white-box positive-example-only classifier based on Naïve Bayes formulation and mutual-information-based combinatorial feature selection achieves state-of-the-art accuracy in antigen presentation prediction, helping design cancer vaccines and understand the antigen presentation process. The differences among patient samples, referred to as the batch effect, informed the development of a power analysis web and a differential expression analysis tool to better identify changes in cell type abundances and omics features. Increasingly large omics data also call for more efficient computational methods. My research utilized multiple modeling and computing techniques, such as conjugate priors, quasi-newton method, parallelism, and GPU acceleration, to address this need. For wider usage by different user groups including method developers, bench scientists, and clinicians, we developed the tools as Python or R packages, or web applications. Overall, my research shows that knowledge-informed interpretable modeling of complex biological processes helps make accurate clinical-relevant predictions and generate new knowledge, both important for cancer biology and broader biomedical applications.en_US
dc.embargo.terms2023-06-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLiang, Shaoheng. "Interpretable and Efficient Machine Learning in Cancer Biology." (2022) Diss., Rice University. <a href="https://hdl.handle.net/1911/114172">https://hdl.handle.net/1911/114172</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/114172en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectMachine Learningen_US
dc.subjectSingle-cell omicsen_US
dc.subjectStatistical inferenceen_US
dc.titleInterpretable and Efficient Machine Learning in Cancer Biologyen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LIANG-DOCUMENT-2022.pdf
Size:
7.62 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: