Statistical and Algorithmic Methods for High-Dimensional and Highly-Correlated Data

Date
2016-05-20
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Technological advances have led to a proliferation of high-dimensional and highly correlated data. This sort of data poses enormous challenges for statistical analysis, pushing the limits of distributed optimization, predictive modeling, and statistical inference. We propose new methods, motivated by biomedical applications, for predictive modeling and variable selection in this challenging setting. First, we build predictive models for multi-subject neuroimaging data. This is an ultra-high-dimensional problem that consists of a highly spatially and temporally correlated matrix of covariates (brain locations by time points) for each subject; few methods currently exist to fit supervised models directly to this tensor data. We propose a novel modeling and algorithmic strategy, Local Aggregate Modeling, to apply generalized linear models (GLMs) to this massive tensor data that not only has better prediction accuracy and interpretability, but can also be fit in a distributed manner. Second, we propose a novel method, Algorithmic Regularization Paths, for variable selection with high-dimensional and highly correlated data. Existing penalized regression methods such as the Lasso solve a relaxation of the best subsets problem that runs in polynomial time; however, the Lasso can only correctly recover the true sparsity pattern if the design matrix satisfies the so-called Irrepresentability Condition or related conditions, which are easily violated when the data is highly correlated. Our method achieves better variable selection performance and faster computation in ultra-high-dimensional and high-correlation settings where the Lasso and many other standard methods fail.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Multi-subject neuroimaging, two-way smoothing, tensor covariates, variable selection, high-dimensional and highly-correlated data.
Citation

Hu, Yue. "Statistical and Algorithmic Methods for High-Dimensional and Highly-Correlated Data." (2016) Diss., Rice University. https://hdl.handle.net/1911/95556.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page