Kernel Sum Sketches for Large Scale Learning

dc.contributor.advisorShrivastava, Anshumali
dc.creatorColeman, Ben Ray
dc.date.accessioned2023-05-24T20:42:19Z
dc.date.available2023-05-24T20:42:19Z
dc.date.created2022-08
dc.date.issued2023-01-03
dc.date.submittedAugust 2022
dc.date.updated2023-05-24T20:42:20Z
dc.description.abstractKernel methods play a central role in machine learning and statistics, but algorithms for such methods scale poorly to large, high-dimensional datasets. Kernel sum computations are often the bottleneck, as they must aggregate all pairwise interactions between a query and each element of the dataset. Prior research has resulted in fast methods to approximate this sum with coresets, kernel approximations and adaptive sampling. However, existing methods still have prohibitively high memory and computation costs, especially for emerging applications in web-scale learning, genomics and streaming data. In my work, I have developed a compressed summary of the dataset, or sketch, that supports fast approximate sum queries for a special class of kernels. The sketch requires memory that is sub-linear in the data size and dimension, can be constructed in a single pass and comes with strong theoretical guarantees on the approximation error. In this thesis, I argue that kernel sum sketches are a new, useful tool for large-scale analysis and learning. I use the sketch to improve the resource-accuracy tradeoff by an order of magnitude for i) differentially private density estimation, linear regression and classification, ii) fast inverse propensity sampling and iii) memory-efficient near-neighbor search.
dc.format.mimetypeapplication/pdf
dc.identifier.citationColeman, Ben Ray. "Kernel Sum Sketches for Large Scale Learning." (2023) Diss., Rice University. <a href="https://hdl.handle.net/1911/114884">https://hdl.handle.net/1911/114884</a>.
dc.identifier.urihttps://hdl.handle.net/1911/114884
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectdensity estimation
dc.subjectsketching algorithms
dc.subjectdata compression
dc.subjectstreaming algorithms
dc.subjectrandomized algorithms
dc.subjectmachine learning
dc.subjectdifferential privacy
dc.subjectnear-neighbor search
dc.titleKernel Sum Sketches for Large Scale Learning
dc.typeThesis
dc.type.materialText
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
COLEMAN-DOCUMENT-2022.pdf
Size:
9.85 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: