Statistical Machine Learning Methodology and Inference for Structured Variable Selection

Date
2018-02-01
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Structured variable selection is a powerful tool for modeling a wide range of real world phenomena. In this work we develop methodology based on structured variable selection for three different problems. In the first, we develop methodology for problems with pre-defined group structure. Our goal is to select at least one variable from each group in the context of predictive regression modeling. This problem is NP-hard, but we propose the tightest convex relaxation: a composite penalty that is a combination of the l1 and l2 norms. Our so-called Exclusive Lasso method performs structured variable selection by ensuring that at least one variable is selected from each group.

In the next problem, we investigate the neurological response to speech by developing a method for the brain decoding problem with electrocorticography (ECoG) data. Electrocorticography measures brain activity at a range of frequencies over time at multiple locations in the brain resulting in highly structured spatial-temporal data. Effective brain decoding relies on effectively identifying relevant features in the data motivating us to propose a new method for brain decoding based on partial least squares called Regularized Higher-Order Partial least squares (RHOP). Our method RHOP (pronounced ``Rope") organizes the data into a tensor and reduces the dimensionality of the data by factoring it into a sparse node factor that identifies important regions in the brain, a smooth time factor that identifies important time points and a smooth frequency factor that identifies frequencies carrying information about the patient stimuli.

Lastly, we develop statistical tests for clustering that help determine whether the clustering assignment is due to random sampling variation or due to actual structure in the population. Clustering is widely applied but there are currently few methods for inference on clustered data. As a result, we develop new tests and statistics for inference after clustering with Convex Clustering. Our tests are based on the geometric interpretation of Hotelling's T squared test and allow us to evaluate the quality of our clustering assignment.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
variable selection, exclusive lasso, statistical inference, clustering, electrocorticography, tensor decomposition
Citation

Campbell, Frederick. "Statistical Machine Learning Methodology and Inference for Structured Variable Selection." (2018) Diss., Rice University. https://hdl.handle.net/1911/105522.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page