Distributed Machine Learning Scale Out with Algorithms and Systems

dc.contributor.advisorJermaine, Chris
dc.creatorYuan, Binhang
dc.date.accessioned2020-12-08T15:26:01Z
dc.date.available2020-12-08T15:26:01Z
dc.date.created2020-12
dc.date.issued2020-12-04
dc.date.submittedDecember 2020
dc.date.updated2020-12-08T15:26:02Z
dc.description.abstractMachine learning (ML) is ubiquitous, and has powered the recent success of artificial intelligence. However, the state of affairs with respect to distributed ML is far from ideal. TensorFlow and PyTorch simply crash when an operation’s inputs and outputs cannot fit on a GPU for model parallelism, or when a model cannot fit on a single machine for data parallelism. A TensorFlow code that works reasonably well on a single machine with eight GPUs procured from a cloud provider often runs slower on two machines totaling sixteen GPUs. In this thesis, I propose solutions at both algorithm and system levels in order to scale out distributed ML. At the algorithm level, I propose a new method to distributed neural network learning, called independent subnet training (IST). In IST, per iteration, a neural network is decomposed into a set of subnetworks of the same depth as the original network, each of which is trained locally, before the various subnets are exchanged and the process is repeated. IST training has many advantages including reduction of communication volume and frequency, implicit extension to model parallelism, and memory limit decrease in each compute site. At the system level, I believe that proper computational and implementation abstractions will allow for the construction of self-configuring, declarative ML systems, especially when the goal is to execute tensor operations for ML in a distributed environment, or partitioned across multiple AI accelerators (ASICs). To this end, I first introduce a tensor relational algebra (TRA), which is expressive to encode any tensor operation that can be written in the Einstein notation, and then consider how TRA expressions can be re-written into an implementation algebra (IA) that enables effective implementation in a distributed environment, as well as how expressions in the IA can be optimized. The empirical study shows that the optimized implementation provided by IA can reach or even out-perform carefully engineered HPC or ML systems for large scale tensor manipulations and ML workflows in distributed clusters.
dc.format.mimetypeapplication/pdf
dc.identifier.citationYuan, Binhang. "Distributed Machine Learning Scale Out with Algorithms and Systems." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/109631">https://hdl.handle.net/1911/109631</a>.
dc.identifier.urihttps://hdl.handle.net/1911/109631
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectDistributed Machine Learning
dc.subjectDistributed Database System.
dc.titleDistributed Machine Learning Scale Out with Algorithms and Systems
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
YUAN-DOCUMENT-2020.pdf
Size:
2.45 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: