Randomized Algorithms for Mega-AI Models

2023-09-012023-082023-08-08August 202Xu, Zhaozhuo. "Randomized Algorithms for Mega-AI Models." (2023) Diss., Rice University. https://hdl.handle.net/1911/115237.https://hdl.handle.net/1911/115237Over the past few years, we have witnessed remarkable accomplishments in machine learning (ML) models due to increases in their sizes. However, the growth in model size has outpaced upgrades to hardware and network bandwidth, resulting in difficulties in training these Mega-AI models within current system infrastructures. Additionally, the shift towards training ML models on user devices, in light of global data privacy protection trends, has constrained hardware resources, exacerbating the tension between effectiveness and efficiency. Moreover, there exists an accuracy-efficiency trade-off in current ML algorithms and systems, where reducing computation and memory usage results in accuracy losses during both training and inference. This thesis aims to demonstrate algorithmic advancements in improving this trade-off in training Mega-AI models. Rather than relying on big data, we propose a focus on good data and sparse models, which refer to models with many parameters but only activate a subset during training for efficiency. We also frame the pursuit of good data and activated parameters as an information retrieval problem and develop hashing algorithms and data structures to maintain training accuracy while improving efficiency. This thesis begins with work on data sparsity and presents a hash-based sampling algorithm for Mega-AI models that adaptively selects data samples during training. We also demonstrate how this approach improves the machine teaching algorithm with 425.12x speedups and 99.76\% energy savings on edge devices. We then discuss our recent success in model sparsity and present a provably efficient hashing algorithm that adaptively selects and updates a subset of parameters during training. We also introduce methods to bridge the accuracy decline of sparse Mega-AI models in the post-training process. Finally, we present DRAGONN, a system that utilizes hash algorithms to achieve near-optimal communication for sparse and distributed ML. To demonstrate the utility of these scalable and sustainable ML algorithms, we apply them to personalized education, seismic imaging, and bioinformatics. Specifically, we show how modifying the ML algorithm can reduce seismic processing time from 10 months to 10 minutes.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.randomized algorithmsmachine learningRandomized Algorithms for Mega-AI ModelsThesis2023-09-01