Statistical Approaches for Large-Scale and Complex Omics Data

Date
2019-12-05
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

In this thesis, we propose several novel statistical approaches to analyzing large-scale and complex omics data. This thesis consists of three projects. In the first project, with the goal of characterizing gene-level relationships between DNA methylation and gene expression, we introduce a sequential penalized regression approach to identify methylation-expression quantitative trait loci (methyl-eQTLs), a term that we have coined to represent, for each gene and tissue type, a sparse set of CpG loci best explaining gene expression and accompanying weights indicating direction and strength of association, which can be used to construct gene-level methylation summaries that are maximally correlated with gene expression for use in integrative models. Using TCGA and MD Anderson colorectal cohorts to build and validate our models, we demonstrate our strategy explains expression variability much better than commonly used integrative methods. In the second project, we propose a unified Bayesian framework to perform quantile regression on functional responses (FQR). Our approach represents functional coefficients with basis functions to borrow strength from nearby locations, and places a global-local shrinkage prior on the basis coefficients to achieve adaptive regularization. We develop a scalable Gibbs sampler to implement the approach. Simulation studies show that our method has superior performance against competing methods. We apply our method to a mass spectrometry dataset and identify proteomic biomarkers of pancreatic cancer that were entirely missed by mean-regression based approaches. The third project is a theoretical investigation of the FQR problem, extending the previous project. We propose an interpolation-based estimator that can be strongly approximated by a sequence of Gaussian processes, based upon which we can derive the convergence rate of the estimator and construct simultaneous confidence bands for the functional coefficient. The strong approximation results also build a theoretical foundation for the development of alternative approaches that are shown to have better finite-sample performance in simulation studies.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Integrative Genomics, Penalized Regression, Functional Data Analysis, Quantile Regression, Bayesian Hierarchical Modeling, Proteomics
Citation

Liu, Yusha. "Statistical Approaches for Large-Scale and Complex Omics Data." (2019) Diss., Rice University. https://hdl.handle.net/1911/107813.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page