Statistical Approaches for Large-Scale and Complex Omics Data

dc.contributor.advisorLi, Meng
dc.contributor.advisorMorris, Jeffrey S.
dc.creatorLiu, Yusha
dc.date.accessioned2019-12-06T19:50:36Z
dc.date.available2020-12-01T06:01:11Z
dc.date.created2019-12
dc.date.issued2019-12-05
dc.date.submittedDecember 2019
dc.date.updated2019-12-06T19:50:36Z
dc.description.abstractIn this thesis, we propose several novel statistical approaches to analyzing large-scale and complex omics data. This thesis consists of three projects. In the first project, with the goal of characterizing gene-level relationships between DNA methylation and gene expression, we introduce a sequential penalized regression approach to identify methylation-expression quantitative trait loci (methyl-eQTLs), a term that we have coined to represent, for each gene and tissue type, a sparse set of CpG loci best explaining gene expression and accompanying weights indicating direction and strength of association, which can be used to construct gene-level methylation summaries that are maximally correlated with gene expression for use in integrative models. Using TCGA and MD Anderson colorectal cohorts to build and validate our models, we demonstrate our strategy explains expression variability much better than commonly used integrative methods. In the second project, we propose a unified Bayesian framework to perform quantile regression on functional responses (FQR). Our approach represents functional coefficients with basis functions to borrow strength from nearby locations, and places a global-local shrinkage prior on the basis coefficients to achieve adaptive regularization. We develop a scalable Gibbs sampler to implement the approach. Simulation studies show that our method has superior performance against competing methods. We apply our method to a mass spectrometry dataset and identify proteomic biomarkers of pancreatic cancer that were entirely missed by mean-regression based approaches. The third project is a theoretical investigation of the FQR problem, extending the previous project. We propose an interpolation-based estimator that can be strongly approximated by a sequence of Gaussian processes, based upon which we can derive the convergence rate of the estimator and construct simultaneous confidence bands for the functional coefficient. The strong approximation results also build a theoretical foundation for the development of alternative approaches that are shown to have better finite-sample performance in simulation studies.
dc.embargo.terms2020-12-01
dc.format.mimetypeapplication/pdf
dc.identifier.citationLiu, Yusha. "Statistical Approaches for Large-Scale and Complex Omics Data." (2019) Diss., Rice University. <a href="https://hdl.handle.net/1911/107813">https://hdl.handle.net/1911/107813</a>.
dc.identifier.urihttps://hdl.handle.net/1911/107813
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectIntegrative Genomics
dc.subjectPenalized Regression
dc.subjectFunctional Data Analysis
dc.subjectQuantile Regression
dc.subjectBayesian Hierarchical Modeling
dc.subjectProteomics
dc.titleStatistical Approaches for Large-Scale and Complex Omics Data
dc.typeThesis
dc.type.materialText
thesis.degree.departmentStatistics
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LIU-DOCUMENT-2019.pdf
Size:
7.32 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: