Parametric classification and variable selection by the minimum integrated squared error criterion

Chi, Eric C.

Parametric classification and variable selection by the minimum integrated squared error criterion

dc.contributor.advisor	Scott, David W.	en_US
dc.creator	Chi, Eric C.	en_US
dc.date.accessioned	2013-03-08T00:33:06Z	en_US
dc.date.available	2013-03-08T00:33:06Z	en_US
dc.date.issued	2012	en_US
dc.description.abstract	This thesis presents a robust solution to the classification and variable selection problem when the dimension of the data, or number of predictor variables, may greatly exceed the number of observations. When faced with the problem of classifying objects given many measured attributes of the objects, the goal is to build a model that makes the most accurate predictions using only the most meaningful subset of the available measurements. The introduction of [cursive l] 1 regularized model titling has inspired many approaches that simultaneously do model fitting and variable selection. If parametric models are employed, the standard approach is some form of regularized maximum likelihood estimation. While this is an asymptotically efficient procedure under very general conditions, it is not robust. Outliers can negatively impact both estimation and variable selection. Moreover, outliers can be very difficult to identify as the number of predictor variables becomes large. Minimizing the integrated squared error, or L 2 error, while less efficient, has been shown to generate parametric estimators that are robust to a fair amount of contamination in several contexts. In this thesis, we present a novel robust parametric regression model for the binary classification problem based on L 2 distance, the logistic L 2 estimator (L 2 E). To perform simultaneous model fitting and variable selection among correlated predictors in the high dimensional setting, an elastic net penalty is introduced. A fast computational algorithm for minimizing the elastic net penalized logistic L 2 E loss is derived and results on the algorithm's global convergence properties are given. Through simulations we demonstrate the utility of the penalized logistic L 2 E at robustly recovering sparse models from high dimensional data in the presence of outliers and inliers. Results on real genomic data are also presented.	en_US
dc.format.extent	98 p.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.callno	THESIS STAT. 2012 CHI	en_US
dc.identifier.citation	Chi, Eric C.. "Parametric classification and variable selection by the minimum integrated squared error criterion." (2012) Diss., Rice University. <a href="https://hdl.handle.net/1911/70219">https://hdl.handle.net/1911/70219</a>.	en_US
dc.identifier.digital	ChiE	en_US
dc.identifier.uri	https://hdl.handle.net/1911/70219	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Pure sciences	en_US
dc.subject	Parametric classification	en_US
dc.subject	Variable selection	en_US
dc.subject	Error criterion	en_US
dc.subject	Logisic regression	en_US
dc.subject	Minimum distance estimation	en_US
dc.subject	Majorization-minimization	en_US
dc.subject	Statistics	en_US
dc.title	Parametric classification and variable selection by the minimum integrated squared error criterion	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Statistics	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ChiE.pdf
Size:: 5.9 MB
Format:: Adobe Portable Document Format

Download

Collections

Rice University Theses and Dissertations
Theses and Dissertations