Resource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Machine learning problems are increasing in complexity, so models are growing correspondingly larger to handle these datasets. (e.g., large-scale transformer networks for language modeling). The increase in the number of input features, model size, and output classification space is straining our limited computational resources.
Given vast amounts of data and limited computational resources, how do we scale machine learning algorithms to gain meaningful insights? Randomized algorithms are an essential tool in our algorithmic toolbox for solving these challenges. These algorithms achieve significant improvements in terms of computational cost or memory usage by incurring some approximation error. They work because most large-scale datasets follow a power-law distribution where a small subset of the data contains the most information. Therefore, we can avoid wasting computational resources by focusing only on the most relevant items.
In this thesis, we explore how to use locality-sensitive hashing (LSH) and the count-sketch data structure for addressing the computational and memory challenges in four distinct areas. (1) The LSH Sampling algorithm uses the LSH data structure as an adaptive sampler. We demonstrate this LSH Sampling approach by accurately estimating the partition function in large-output spaces. (2) MISSION is a large-scale, feature extraction algorithm that uses the count-sketch data structure to store a compressed representation of the entire feature space. (3) The Count-Sketch Optimizer is an algorithm for minimizing the memory footprint of popular first-order gradient optimizers (e.g., Adam, Adagrad, Momentum). (4) Finally, we show the usefulness of our compressed memory optimizer by efficiently training a synthetic question generator, which uses large-scale transformer networks to generate high-quality, human-readable question-answer pairs.
Description
Advisor
Degree
Type
Keywords
Citation
Spring, Ryan Daniel. "Resource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)." (2020) Diss., Rice University. https://hdl.handle.net/1911/108402.