Statistical Machine Learning Methodology and Inference for Structured Variable Selection

dc.contributor.advisorAllen, Genevera Ien_US
dc.creatorCampbell, Fredericken_US
dc.date.accessioned2019-05-16T20:32:07Zen_US
dc.date.available2019-05-16T20:32:07Zen_US
dc.date.created2017-12en_US
dc.date.issued2018-02-01en_US
dc.date.submittedDecember 2017en_US
dc.date.updated2019-05-16T20:32:07Zen_US
dc.description.abstractStructured variable selection is a powerful tool for modeling a wide range of real world phenomena. In this work we develop methodology based on structured variable selection for three different problems. In the first, we develop methodology for problems with pre-defined group structure. Our goal is to select at least one variable from each group in the context of predictive regression modeling. This problem is NP-hard, but we propose the tightest convex relaxation: a composite penalty that is a combination of the l1 and l2 norms. Our so-called Exclusive Lasso method performs structured variable selection by ensuring that at least one variable is selected from each group. In the next problem, we investigate the neurological response to speech by developing a method for the brain decoding problem with electrocorticography (ECoG) data. Electrocorticography measures brain activity at a range of frequencies over time at multiple locations in the brain resulting in highly structured spatial-temporal data. Effective brain decoding relies on effectively identifying relevant features in the data motivating us to propose a new method for brain decoding based on partial least squares called Regularized Higher-Order Partial least squares (RHOP). Our method RHOP (pronounced ``Rope") organizes the data into a tensor and reduces the dimensionality of the data by factoring it into a sparse node factor that identifies important regions in the brain, a smooth time factor that identifies important time points and a smooth frequency factor that identifies frequencies carrying information about the patient stimuli. Lastly, we develop statistical tests for clustering that help determine whether the clustering assignment is due to random sampling variation or due to actual structure in the population. Clustering is widely applied but there are currently few methods for inference on clustered data. As a result, we develop new tests and statistics for inference after clustering with Convex Clustering. Our tests are based on the geometric interpretation of Hotelling's T squared test and allow us to evaluate the quality of our clustering assignment.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationCampbell, Frederick. "Statistical Machine Learning Methodology and Inference for Structured Variable Selection." (2018) Diss., Rice University. <a href="https://hdl.handle.net/1911/105522">https://hdl.handle.net/1911/105522</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/105522en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectvariable selectionen_US
dc.subjectexclusive lassoen_US
dc.subjectstatistical inferenceen_US
dc.subjectclusteringen_US
dc.subjectelectrocorticographyen_US
dc.subjecttensor decompositionen_US
dc.titleStatistical Machine Learning Methodology and Inference for Structured Variable Selectionen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CAMPBELL-DOCUMENT-2017.pdf
Size:
11.5 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: