Statistical and Algorithmic Methods for High-Dimensional and Highly-Correlated Data

dc.contributor.advisorAllen, Genevera Ien_US
dc.creatorHu, Yueen_US
dc.date.accessioned2017-07-31T15:40:10Zen_US
dc.date.available2017-07-31T15:40:10Zen_US
dc.date.created2016-12en_US
dc.date.issued2016-05-20en_US
dc.date.submittedDecember 2016en_US
dc.date.updated2017-07-31T15:40:10Zen_US
dc.description.abstractTechnological advances have led to a proliferation of high-dimensional and highly correlated data. This sort of data poses enormous challenges for statistical analysis, pushing the limits of distributed optimization, predictive modeling, and statistical inference. We propose new methods, motivated by biomedical applications, for predictive modeling and variable selection in this challenging setting. First, we build predictive models for multi-subject neuroimaging data. This is an ultra-high-dimensional problem that consists of a highly spatially and temporally correlated matrix of covariates (brain locations by time points) for each subject; few methods currently exist to fit supervised models directly to this tensor data. We propose a novel modeling and algorithmic strategy, Local Aggregate Modeling, to apply generalized linear models (GLMs) to this massive tensor data that not only has better prediction accuracy and interpretability, but can also be fit in a distributed manner. Second, we propose a novel method, Algorithmic Regularization Paths, for variable selection with high-dimensional and highly correlated data. Existing penalized regression methods such as the Lasso solve a relaxation of the best subsets problem that runs in polynomial time; however, the Lasso can only correctly recover the true sparsity pattern if the design matrix satisfies the so-called Irrepresentability Condition or related conditions, which are easily violated when the data is highly correlated. Our method achieves better variable selection performance and faster computation in ultra-high-dimensional and high-correlation settings where the Lasso and many other standard methods fail.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationHu, Yue. "Statistical and Algorithmic Methods for High-Dimensional and Highly-Correlated Data." (2016) Diss., Rice University. <a href="https://hdl.handle.net/1911/95556">https://hdl.handle.net/1911/95556</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/95556en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectMulti-subject neuroimagingen_US
dc.subjecttwo-way smoothingen_US
dc.subjecttensor covariatesen_US
dc.subjectvariable selectionen_US
dc.subjecthigh-dimensional and highly-correlated dataen_US
dc.titleStatistical and Algorithmic Methods for High-Dimensional and Highly-Correlated Dataen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.majorStatistical Learningen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
HU-DOCUMENT-2016.pdf
Size:
5.68 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: