Distributed Machine Learning Scale Out with Algorithms and Systems

Yuan, Binhang

Distributed Machine Learning Scale Out with Algorithms and Systems

dc.contributor.advisor	Jermaine, Chris	en_US
dc.creator	Yuan, Binhang	en_US
dc.date.accessioned	2020-12-08T15:26:01Z	en_US
dc.date.available	2020-12-08T15:26:01Z	en_US
dc.date.created	2020-12	en_US
dc.date.issued	2020-12-04	en_US
dc.date.submitted	December 2020	en_US
dc.date.updated	2020-12-08T15:26:02Z	en_US
dc.description.abstract	Machine learning (ML) is ubiquitous, and has powered the recent success of artificial intelligence. However, the state of affairs with respect to distributed ML is far from ideal. TensorFlow and PyTorch simply crash when an operation’s inputs and outputs cannot fit on a GPU for model parallelism, or when a model cannot fit on a single machine for data parallelism. A TensorFlow code that works reasonably well on a single machine with eight GPUs procured from a cloud provider often runs slower on two machines totaling sixteen GPUs. In this thesis, I propose solutions at both algorithm and system levels in order to scale out distributed ML. At the algorithm level, I propose a new method to distributed neural network learning, called independent subnet training (IST). In IST, per iteration, a neural network is decomposed into a set of subnetworks of the same depth as the original network, each of which is trained locally, before the various subnets are exchanged and the process is repeated. IST training has many advantages including reduction of communication volume and frequency, implicit extension to model parallelism, and memory limit decrease in each compute site. At the system level, I believe that proper computational and implementation abstractions will allow for the construction of self-configuring, declarative ML systems, especially when the goal is to execute tensor operations for ML in a distributed environment, or partitioned across multiple AI accelerators (ASICs). To this end, I first introduce a tensor relational algebra (TRA), which is expressive to encode any tensor operation that can be written in the Einstein notation, and then consider how TRA expressions can be re-written into an implementation algebra (IA) that enables effective implementation in a distributed environment, as well as how expressions in the IA can be optimized. The empirical study shows that the optimized implementation provided by IA can reach or even out-perform carefully engineered HPC or ML systems for large scale tensor manipulations and ML workflows in distributed clusters.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Yuan, Binhang. "Distributed Machine Learning Scale Out with Algorithms and Systems." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/109631">https://hdl.handle.net/1911/109631</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/109631	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Distributed Machine Learning	en_US
dc.subject	Distributed Database System.	en_US
dc.title	Distributed Machine Learning Scale Out with Algorithms and Systems	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Computer Science	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: YUAN-DOCUMENT-2020.pdf
Size:: 2.45 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.61 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations