Statistical and Algorithmic Methods for High-Dimensional and Highly-Correlated Data

dc.contributor.advisorAllen, Genevera I
dc.creatorHu, Yue
dc.date.accessioned2017-07-31T15:40:10Z
dc.date.available2017-07-31T15:40:10Z
dc.date.created2016-12
dc.date.issued2016-05-20
dc.date.submittedDecember 2016
dc.date.updated2017-07-31T15:40:10Z
dc.description.abstractTechnological advances have led to a proliferation of high-dimensional and highly correlated data. This sort of data poses enormous challenges for statistical analysis, pushing the limits of distributed optimization, predictive modeling, and statistical inference. We propose new methods, motivated by biomedical applications, for predictive modeling and variable selection in this challenging setting. First, we build predictive models for multi-subject neuroimaging data. This is an ultra-high-dimensional problem that consists of a highly spatially and temporally correlated matrix of covariates (brain locations by time points) for each subject; few methods currently exist to fit supervised models directly to this tensor data. We propose a novel modeling and algorithmic strategy, Local Aggregate Modeling, to apply generalized linear models (GLMs) to this massive tensor data that not only has better prediction accuracy and interpretability, but can also be fit in a distributed manner. Second, we propose a novel method, Algorithmic Regularization Paths, for variable selection with high-dimensional and highly correlated data. Existing penalized regression methods such as the Lasso solve a relaxation of the best subsets problem that runs in polynomial time; however, the Lasso can only correctly recover the true sparsity pattern if the design matrix satisfies the so-called Irrepresentability Condition or related conditions, which are easily violated when the data is highly correlated. Our method achieves better variable selection performance and faster computation in ultra-high-dimensional and high-correlation settings where the Lasso and many other standard methods fail.
dc.format.mimetypeapplication/pdf
dc.identifier.citationHu, Yue. "Statistical and Algorithmic Methods for High-Dimensional and Highly-Correlated Data." (2016) Diss., Rice University. <a href="https://hdl.handle.net/1911/95556">https://hdl.handle.net/1911/95556</a>.
dc.identifier.urihttps://hdl.handle.net/1911/95556
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectMulti-subject neuroimaging
dc.subjecttwo-way smoothing
dc.subjecttensor covariates
dc.subjectvariable selection
dc.subjecthigh-dimensional and highly-correlated data.
dc.titleStatistical and Algorithmic Methods for High-Dimensional and Highly-Correlated Data
dc.typeThesis
dc.type.materialText
thesis.degree.departmentStatistics
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.majorStatistical Learning
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
HU-DOCUMENT-2016.pdf
Size:
5.68 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: