Distributed Machine Learning Scale Out with Algorithms and Systems

dc.contributor.advisorJermaine, Chrisen_US
dc.creatorYuan, Binhangen_US
dc.date.accessioned2020-12-08T15:26:01Zen_US
dc.date.available2020-12-08T15:26:01Zen_US
dc.date.created2020-12en_US
dc.date.issued2020-12-04en_US
dc.date.submittedDecember 2020en_US
dc.date.updated2020-12-08T15:26:02Zen_US
dc.description.abstractMachine learning (ML) is ubiquitous, and has powered the recent success of artificial intelligence. However, the state of affairs with respect to distributed ML is far from ideal. TensorFlow and PyTorch simply crash when an operation’s inputs and outputs cannot fit on a GPU for model parallelism, or when a model cannot fit on a single machine for data parallelism. A TensorFlow code that works reasonably well on a single machine with eight GPUs procured from a cloud provider often runs slower on two machines totaling sixteen GPUs. In this thesis, I propose solutions at both algorithm and system levels in order to scale out distributed ML. At the algorithm level, I propose a new method to distributed neural network learning, called independent subnet training (IST). In IST, per iteration, a neural network is decomposed into a set of subnetworks of the same depth as the original network, each of which is trained locally, before the various subnets are exchanged and the process is repeated. IST training has many advantages including reduction of communication volume and frequency, implicit extension to model parallelism, and memory limit decrease in each compute site. At the system level, I believe that proper computational and implementation abstractions will allow for the construction of self-configuring, declarative ML systems, especially when the goal is to execute tensor operations for ML in a distributed environment, or partitioned across multiple AI accelerators (ASICs). To this end, I first introduce a tensor relational algebra (TRA), which is expressive to encode any tensor operation that can be written in the Einstein notation, and then consider how TRA expressions can be re-written into an implementation algebra (IA) that enables effective implementation in a distributed environment, as well as how expressions in the IA can be optimized. The empirical study shows that the optimized implementation provided by IA can reach or even out-perform carefully engineered HPC or ML systems for large scale tensor manipulations and ML workflows in distributed clusters.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationYuan, Binhang. "Distributed Machine Learning Scale Out with Algorithms and Systems." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/109631">https://hdl.handle.net/1911/109631</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/109631en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectDistributed Machine Learningen_US
dc.subjectDistributed Database System.en_US
dc.titleDistributed Machine Learning Scale Out with Algorithms and Systemsen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
YUAN-DOCUMENT-2020.pdf
Size:
2.45 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: