Nakhleh, LuayChen, Ken2022-12-212023-06-012022-122022-12-01December 2Liang, Shaoheng. "Interpretable and Efficient Machine Learning in Cancer Biology." (2022) Diss., Rice University. <a href="https://hdl.handle.net/1911/114172">https://hdl.handle.net/1911/114172</a>.https://hdl.handle.net/1911/114172The past decade witnessed the advance of machine learning and cancer biology. In therapeutics, chimeric antigen receptor (CAR) treatments and cancer vaccines give new hope for ending cancer. Single-cell sequencing and mass spectrometry enable personalized high-resolution observations of cancer cell behavior and immune response. Computational cancer biology is no different; the continuous evolution of machine learning models, especially neural networks, provides unprecedented potential in making predictions. However, efforts are still needed to tailor the models to interpret specific biological processes. My research explores how knowledge-informed adaptation of machine learning techniques, such as neural networks, metric learning, and probabilistic classifiers helps answer questions in cancer biology. For example, periodicity in the cell cycle and other biological processes inspired our use of a sinusoidal activation function in an autoencoder to discover the periodicity in single-cell transcriptomic data. To efficiently predict biomarkers driving tumorigenesis and immune cell differentiation, we adapted UMAP with L1 regularization and our implementation of OWLQN (Orthant-Wise Limited-memory Quasi-Newton) optimizer. Inspired by structural motifs in antigen presentation, our white-box positive-example-only classifier based on Naïve Bayes formulation and mutual-information-based combinatorial feature selection achieves state-of-the-art accuracy in antigen presentation prediction, helping design cancer vaccines and understand the antigen presentation process. The differences among patient samples, referred to as the batch effect, informed the development of a power analysis web and a differential expression analysis tool to better identify changes in cell type abundances and omics features. Increasingly large omics data also call for more efficient computational methods. My research utilized multiple modeling and computing techniques, such as conjugate priors, quasi-newton method, parallelism, and GPU acceleration, to address this need. For wider usage by different user groups including method developers, bench scientists, and clinicians, we developed the tools as Python or R packages, or web applications. Overall, my research shows that knowledge-informed interpretable modeling of complex biological processes helps make accurate clinical-relevant predictions and generate new knowledge, both important for cancer biology and broader biomedical applications.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.Machine LearningSingle-cell omicsStatistical inferenceInterpretable and Efficient Machine Learning in Cancer BiologyThesis2022-12-21