Declarative Machine Learning with Einsummable

Bourgeois, Daniel Christopher

Declarative Machine Learning with Einsummable

dc.contributor.advisor	Jermaine, Chris M	en_US
dc.creator	Bourgeois, Daniel Christopher	en_US
dc.date.accessioned	2024-08-30T15:58:35Z	en_US
dc.date.available	2024-08-30T15:58:35Z	en_US
dc.date.created	2024-08	en_US
dc.date.issued	2024-07-23	en_US
dc.date.submitted	August 2024	en_US
dc.date.updated	2024-08-30T15:58:35Z	en_US
dc.description.abstract	Modern tensor-based machine learning (ML) systems such as PyTorch and TensorFlow have high performance but significant limitations for large-scale ML. These systems require a programmer to manually decompose ML computations so that they can run on multiple machines. Not only is this challenging for end-users, but moving from one hardware setup to the next requires writing a lot of code. We introduce a new end-to-end ML system called ``Einsummble'' that automatically decomposes computations to match the available hardware. Unlike existing systems, we are guided by one fundamental design principle: at all costs, the user may only say what they want to compute, not how it is to be computed. Instead of painstakingly building a ``model parallel'' or ``data parallel'' implementation, a user of Einsummable needs only build their computation in our Einsummable language. To make Einsummable a reality, we designed an Einsummable language for users to interact with, to create what we call EinGraphs. The Einsummable language is built on the extended Einstein summation notation, familiar to many ML practitioners. Our language is expressive enough to represent state of the art generative ML models, including Llama. In addition, we support automatic differentiation. On the other end of the abstraction spectrum, we created a compute graph specification for machines to execute, called TaskGraphs. TaskGraphs are designed to be executed by distributed, asynchronous compute engines. For our experiments, we built a distributed CPU execution engine, scaling to 32 machines, each with 64 processors. Even though we targeted CPU clusters, the TaskGraph abstraction is also suitable for clusters of GPUs. Most importantly, given hardware parameters, we compile EinGraphs into \\ TaskGraphs without user intervention. The discovered TaskGraph solution may very well include the common model or data parallel solutions. Our main algorithm for this is called EinDecomp, which decomposes EinGraphs so that the computation exposes enough parallelism to keep all processors busy without also introducing undue communication burden.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Bourgeois, Daniel Christopher. Declarative Machine Learning with Einsummable. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/117775	en_US
dc.identifier.uri	https://hdl.handle.net/1911/117775	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Machine Learning	en_US
dc.subject	Deep Learning	en_US
dc.subject	Automatic Parallelism	en_US
dc.subject	Distributed Computing	en_US
dc.title	Declarative Machine Learning with Einsummable	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Statistics	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: BOURGEOIS-DOCUMENT-2024.pdf
Size:: 2.28 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.85 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.98 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations