Resource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)

dc.contributor.advisorShrivastava, Anshumali
dc.creatorSpring, Ryan Daniel
dc.date.accessioned2020-04-27T19:24:37Z
dc.date.available2020-04-27T19:24:37Z
dc.date.created2020-05
dc.date.issued2020-04-24
dc.date.submittedMay 2020
dc.date.updated2020-04-27T19:24:38Z
dc.description.abstractMachine learning problems are increasing in complexity, so models are growing correspondingly larger to handle these datasets. (e.g., large-scale transformer networks for language modeling). The increase in the number of input features, model size, and output classification space is straining our limited computational resources. Given vast amounts of data and limited computational resources, how do we scale machine learning algorithms to gain meaningful insights? Randomized algorithms are an essential tool in our algorithmic toolbox for solving these challenges. These algorithms achieve significant improvements in terms of computational cost or memory usage by incurring some approximation error. They work because most large-scale datasets follow a power-law distribution where a small subset of the data contains the most information. Therefore, we can avoid wasting computational resources by focusing only on the most relevant items. In this thesis, we explore how to use locality-sensitive hashing (LSH) and the count-sketch data structure for addressing the computational and memory challenges in four distinct areas. (1) The LSH Sampling algorithm uses the LSH data structure as an adaptive sampler. We demonstrate this LSH Sampling approach by accurately estimating the partition function in large-output spaces. (2) MISSION is a large-scale, feature extraction algorithm that uses the count-sketch data structure to store a compressed representation of the entire feature space. (3) The Count-Sketch Optimizer is an algorithm for minimizing the memory footprint of popular first-order gradient optimizers (e.g., Adam, Adagrad, Momentum). (4) Finally, we show the usefulness of our compressed memory optimizer by efficiently training a synthetic question generator, which uses large-scale transformer networks to generate high-quality, human-readable question-answer pairs.
dc.format.mimetypeapplication/pdf
dc.identifier.citationSpring, Ryan Daniel. "Resource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/108402">https://hdl.handle.net/1911/108402</a>.
dc.identifier.urihttps://hdl.handle.net/1911/108402
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectDeep Learning
dc.subjectMachine Learning
dc.subjectLocality-Sensitive Hashing
dc.subjectCount-Sketch
dc.subjectStochastic Optimization
dc.subjectNatural Language Processing
dc.subjectQuestion Answering
dc.subjectQuestion Generation
dc.subjectMeta-genomics
dc.subjectFeature Selection
dc.subjectMutual Information
dc.subjectImportance Sampling
dc.subjectPartition
dc.titleResource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SPRING-DOCUMENT-2020.pdf
Size:
5.7 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: