Statistical Machine Learning Methodology and Inference for Structured Variable Selection
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Structured variable selection is a powerful tool for modeling a wide range of real world phenomena. In this work we develop methodology based on structured variable selection for three different problems. In the first, we develop methodology for problems with pre-defined group structure. Our goal is to select at least one variable from each group in the context of predictive regression modeling. This problem is NP-hard, but we propose the tightest convex relaxation: a composite penalty that is a combination of the l1 and l2 norms. Our so-called Exclusive Lasso method performs structured variable selection by ensuring that at least one variable is selected from each group.
In the next problem, we investigate the neurological response to speech by developing a method for the brain decoding problem with electrocorticography (ECoG) data. Electrocorticography measures brain activity at a range of frequencies over time at multiple locations in the brain resulting in highly structured spatial-temporal data. Effective brain decoding relies on effectively identifying relevant features in the data motivating us to propose a new method for brain decoding based on partial least squares called Regularized Higher-Order Partial least squares (RHOP). Our method RHOP (pronounced ``Rope") organizes the data into a tensor and reduces the dimensionality of the data by factoring it into a sparse node factor that identifies important regions in the brain, a smooth time factor that identifies important time points and a smooth frequency factor that identifies frequencies carrying information about the patient stimuli.
Lastly, we develop statistical tests for clustering that help determine whether the clustering assignment is due to random sampling variation or due to actual structure in the population. Clustering is widely applied but there are currently few methods for inference on clustered data. As a result, we develop new tests and statistics for inference after clustering with Convex Clustering. Our tests are based on the geometric interpretation of Hotelling's T squared test and allow us to evaluate the quality of our clustering assignment.
Description
Advisor
Degree
Type
Keywords
Citation
Campbell, Frederick. "Statistical Machine Learning Methodology and Inference for Structured Variable Selection." (2018) Diss., Rice University. https://hdl.handle.net/1911/105522.