Resource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)

Spring, Ryan Daniel

Resource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)

dc.contributor.advisor	Shrivastava, Anshumali	en_US
dc.creator	Spring, Ryan Daniel	en_US
dc.date.accessioned	2020-04-27T19:24:37Z	en_US
dc.date.available	2020-04-27T19:24:37Z	en_US
dc.date.created	2020-05	en_US
dc.date.issued	2020-04-24	en_US
dc.date.submitted	May 2020	en_US
dc.date.updated	2020-04-27T19:24:38Z	en_US
dc.description.abstract	Machine learning problems are increasing in complexity, so models are growing correspondingly larger to handle these datasets. (e.g., large-scale transformer networks for language modeling). The increase in the number of input features, model size, and output classification space is straining our limited computational resources. Given vast amounts of data and limited computational resources, how do we scale machine learning algorithms to gain meaningful insights? Randomized algorithms are an essential tool in our algorithmic toolbox for solving these challenges. These algorithms achieve significant improvements in terms of computational cost or memory usage by incurring some approximation error. They work because most large-scale datasets follow a power-law distribution where a small subset of the data contains the most information. Therefore, we can avoid wasting computational resources by focusing only on the most relevant items. In this thesis, we explore how to use locality-sensitive hashing (LSH) and the count-sketch data structure for addressing the computational and memory challenges in four distinct areas. (1) The LSH Sampling algorithm uses the LSH data structure as an adaptive sampler. We demonstrate this LSH Sampling approach by accurately estimating the partition function in large-output spaces. (2) MISSION is a large-scale, feature extraction algorithm that uses the count-sketch data structure to store a compressed representation of the entire feature space. (3) The Count-Sketch Optimizer is an algorithm for minimizing the memory footprint of popular first-order gradient optimizers (e.g., Adam, Adagrad, Momentum). (4) Finally, we show the usefulness of our compressed memory optimizer by efficiently training a synthetic question generator, which uses large-scale transformer networks to generate high-quality, human-readable question-answer pairs.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Spring, Ryan Daniel. "Resource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/108402">https://hdl.handle.net/1911/108402</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/108402	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Deep Learning	en_US
dc.subject	Machine Learning	en_US
dc.subject	Locality-Sensitive Hashing	en_US
dc.subject	Count-Sketch	en_US
dc.subject	Stochastic Optimization	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Question Answering	en_US
dc.subject	Question Generation	en_US
dc.subject	Meta-genomics	en_US
dc.subject	Feature Selection	en_US
dc.subject	Mutual Information	en_US
dc.subject	Importance Sampling	en_US
dc.subject	Partition	en_US
dc.title	Resource-Efficient Machine Learning via Count-Sketches and Locality-Sensitive Hashing (LSH)	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Computer Science	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SPRING-DOCUMENT-2020.pdf
Size:: 5.7 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.6 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations