Skewers, the Carnegie Classification, and the Hybrid Bootstrap

dc.contributor.advisorScott, David Wen_US
dc.creatorKosar, Roberten_US
dc.date.accessioned2019-05-16T20:53:17Zen_US
dc.date.available2019-05-16T20:53:17Zen_US
dc.date.created2017-12en_US
dc.date.issued2017-11-30en_US
dc.date.submittedDecember 2017en_US
dc.date.updated2019-05-16T20:53:17Zen_US
dc.description.abstractPrincipal component analysis is an important statistical technique for dimension reduction and exploratory data analysis. However, it is not robust to outliers and may obfuscate important data structure such as clustering. We propose a version of principal component analysis based on the robust L2E method. The technique seeks to find the principal components of potentially highly non-spherical distribution components of a Gaussian mixture model. The algorithm requires neither specification of the number of clusters nor estimation of a full covariance matrix in order to run. The Carnegie classification is a decades-old (updated approximately every five years) taxonomy for research universities. However, it is based on questionable statistical methodology and suffers from a number of issues. We present a criticism of the Carnegie methodology, and offer two alternatives that are designed to be consistent with Carnegie's goals but also more statistically sound. We also present a visualization application where users can explore both the Carnegie system and our proposed systems. Preventing overfitting is an important topic in the field of machine learning, where it is common or even mundane to fit models with millions of parameters. One of the most popular algorithms for preventing overfitting is dropout. We present a drop-in replacement for dropout that offers superior performance on standard benchmark datasets and is relatively insensitive to hyperparameter choice.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationKosar, Robert. "Skewers, the Carnegie Classification, and the Hybrid Bootstrap." (2017) Diss., Rice University. <a href="https://hdl.handle.net/1911/105553">https://hdl.handle.net/1911/105553</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/105553en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectClusteringen_US
dc.subjectPrincipal Component Analysisen_US
dc.subjectUniversity Rankingsen_US
dc.subjectRegularizationen_US
dc.titleSkewers, the Carnegie Classification, and the Hybrid Bootstrapen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentStatisticsen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KOSAR-DOCUMENT-2017.pdf
Size:
3.37 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: