Some Rare LSH Gems for Large-scale Machine Learning

dc.contributor.advisorShrivastava, Anshumalien_US
dc.creatorLuo, Chenen_US
dc.date.accessioned2020-01-21T19:13:40Zen_US
dc.date.available2020-01-21T19:13:40Zen_US
dc.date.created2019-08en_US
dc.date.issued2020-01-17en_US
dc.date.submittedAugust 2019en_US
dc.date.updated2020-01-21T19:13:40Zen_US
dc.description.abstractLocality Sensitive Hashing (LSH) is an algorithm for approximate nearest neighbor (ANN) search in high dimensional space. In this thesis, instead of using LSH as an ANN tool, we investigate the possibility of using LSH for addressing the computational and memory challenges in large scale machine learning tasks. We show some rare 'gems' of locality-sensitive hashing that can shed important lights on large scale learning system. We first show the power of LSH for high-speed anomaly detection. Anomaly detection is one of the frequent and important subroutines deployed in largescale data processing applications. Even being a well-studied topic, existing techniques for unsupervised anomaly detection require storing significant amounts of data, which is prohibitive from memory, latency and privacy perspectives, especially for small mobile devices which has ultralow memory budget and limited computational power. We Introduce ACE (Arrays of (locality-sensitive) Count Estimators) algorithm that can much faster than most state-of-the-art unsupervised anomaly detection algorithms with very low memory requirement. Secondly, we show a novel sampler view of LSH and propose to use LSH for scaling-up Split-Merge MCMC Inference. Split-Merge MCMC (Monte Carlo Markov Chain) is one of the essential and popular variants of MCMC for problems when an MCMC state consists of an unknown number of components. It is well known that state-of-the-art methods for split-merge MCMC do not scale well. Here, we leverage some unique properties of weighted MinHash, which is a popular LSH, to design a novel class of split-merge proposals which are significantly more informative than random sampling but at the same time efficient to compute. In the end, we show a practical usage of LSH for Indoor Navigation tasks. In this work, we developed the first camera based (privacy-preserving) indoor mobile positioning system, CaPSuLe, which does not involve any communication (or data transfer) with any other device or the cloud. The system only needs 78.9MB of memory and can localize a mobile device with $92.11\%$ accuracy with very fast speed. The ability to run the complete system on the mobile device eliminates the need for the cloud, making CaPSuLe a privacy-preserving localization algorithm by design as it does not require any communication.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLuo, Chen. "Some Rare LSH Gems for Large-scale Machine Learning." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/107980">https://hdl.handle.net/1911/107980</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/107980en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectLocality Sensitive Hashingen_US
dc.subjectLarge Scale Machine Learningen_US
dc.titleSome Rare LSH Gems for Large-scale Machine Learningen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LUO-DOCUMENT-2019.pdf
Size:
1.89 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: