Statistical Machine Learning Methodology and Inference for Structured Variable Selection

Campbell, Frederick

Statistical Machine Learning Methodology and Inference for Structured Variable Selection

dc.contributor.advisor	Allen, Genevera I	en_US
dc.creator	Campbell, Frederick	en_US
dc.date.accessioned	2019-05-16T20:32:07Z	en_US
dc.date.available	2019-05-16T20:32:07Z	en_US
dc.date.created	2017-12	en_US
dc.date.issued	2018-02-01	en_US
dc.date.submitted	December 2017	en_US
dc.date.updated	2019-05-16T20:32:07Z	en_US
dc.description.abstract	Structured variable selection is a powerful tool for modeling a wide range of real world phenomena. In this work we develop methodology based on structured variable selection for three different problems. In the first, we develop methodology for problems with pre-defined group structure. Our goal is to select at least one variable from each group in the context of predictive regression modeling. This problem is NP-hard, but we propose the tightest convex relaxation: a composite penalty that is a combination of the l1 and l2 norms. Our so-called Exclusive Lasso method performs structured variable selection by ensuring that at least one variable is selected from each group. In the next problem, we investigate the neurological response to speech by developing a method for the brain decoding problem with electrocorticography (ECoG) data. Electrocorticography measures brain activity at a range of frequencies over time at multiple locations in the brain resulting in highly structured spatial-temporal data. Effective brain decoding relies on effectively identifying relevant features in the data motivating us to propose a new method for brain decoding based on partial least squares called Regularized Higher-Order Partial least squares (RHOP). Our method RHOP (pronounced ``Rope") organizes the data into a tensor and reduces the dimensionality of the data by factoring it into a sparse node factor that identifies important regions in the brain, a smooth time factor that identifies important time points and a smooth frequency factor that identifies frequencies carrying information about the patient stimuli. Lastly, we develop statistical tests for clustering that help determine whether the clustering assignment is due to random sampling variation or due to actual structure in the population. Clustering is widely applied but there are currently few methods for inference on clustered data. As a result, we develop new tests and statistics for inference after clustering with Convex Clustering. Our tests are based on the geometric interpretation of Hotelling's T squared test and allow us to evaluate the quality of our clustering assignment.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Campbell, Frederick. "Statistical Machine Learning Methodology and Inference for Structured Variable Selection." (2018) Diss., Rice University. <a href="https://hdl.handle.net/1911/105522">https://hdl.handle.net/1911/105522</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/105522	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	variable selection	en_US
dc.subject	exclusive lasso	en_US
dc.subject	statistical inference	en_US
dc.subject	clustering	en_US
dc.subject	electrocorticography	en_US
dc.subject	tensor decomposition	en_US
dc.title	Statistical Machine Learning Methodology and Inference for Structured Variable Selection	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Statistics	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: CAMPBELL-DOCUMENT-2017.pdf
Size:: 11.5 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.85 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.61 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations