Ridge Regularization by Randomization in Linear Ensembles
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Ensemble methods that average over a collection of independent predictors that are each limited to random sampling of both the examples and features of the training data command a significant presence in machine learning, such as the ever-popular random forest. Combining many such randomized predictors into an ensemble produces a highly robust predictor with excellent generalization properties; however, understanding the specific nature of the effect of randomization on ensemble method behavior has received little theoretical attention.
We study the case of an ensembles of linear predictors, where each individual predictor is a linear predictor fit on a randomized sample of the data matrix. We first show a straightforward argument that an ensemble of ordinary least squares predictors fit on a simple subsampling can achieve the optimal ridge regression risk in a standard Gaussian data setting. We then significantly generalize this result to eliminate essentially all assumptions on the data by considering ensembles of linear random projections or sketches of the data, and in doing so reveal an asymptotic first-order equivalence between linear regression on sketched data and ridge regression. By extending this analysis to a second-order characterization, we show how large ensembles converge to ridge regression under quadratic metrics.
Description
Advisor
Degree
Type
Keywords
Citation
LeJeune, Daniel. "Ridge Regularization by Randomization in Linear Ensembles." (2022) Diss., Rice University. https://hdl.handle.net/1911/114188.