Kernel Sum Sketches for Large Scale Learning

Date
2023-01-03
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Kernel methods play a central role in machine learning and statistics, but algorithms for such methods scale poorly to large, high-dimensional datasets. Kernel sum computations are often the bottleneck, as they must aggregate all pairwise interactions between a query and each element of the dataset. Prior research has resulted in fast methods to approximate this sum with coresets, kernel approximations and adaptive sampling. However, existing methods still have prohibitively high memory and computation costs, especially for emerging applications in web-scale learning, genomics and streaming data. In my work, I have developed a compressed summary of the dataset, or sketch, that supports fast approximate sum queries for a special class of kernels. The sketch requires memory that is sub-linear in the data size and dimension, can be constructed in a single pass and comes with strong theoretical guarantees on the approximation error. In this thesis, I argue that kernel sum sketches are a new, useful tool for large-scale analysis and learning. I use the sketch to improve the resource-accuracy tradeoff by an order of magnitude for i) differentially private density estimation, linear regression and classification, ii) fast inverse propensity sampling and iii) memory-efficient near-neighbor search.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
density estimation, sketching algorithms, data compression, streaming algorithms, randomized algorithms, machine learning, differential privacy, near-neighbor search
Citation

Coleman, Ben Ray. "Kernel Sum Sketches for Large Scale Learning." (2023) Diss., Rice University. https://hdl.handle.net/1911/114884.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page