Optimizing Compiler Heuristics with Machine Learning

Grubisic, Dejan

Optimizing Compiler Heuristics with Machine Learning

dc.contributor.advisor	Mellor-Crummey, John	en_US
dc.creator	Grubisic, Dejan	en_US
dc.date.accessioned	2024-05-22T15:57:54Z	en_US
dc.date.available	2024-05-22T15:57:54Z	en_US
dc.date.created	2024-05	en_US
dc.date.issued	2024-04-17	en_US
dc.date.submitted	May 2024	en_US
dc.date.updated	2024-05-22T15:57:54Z	en_US
dc.description.abstract	Compiler technology is crucial for enhancing the performance and efficiency of modern software. The complexity of novel computer architectures, the ever-evolving software landscape, and the ever-growing scale of computation have made manual optimization techniques increasingly difficult and time-consuming. To address this, machine learning (ML) can recognize intricate patterns and automatically tailor code generation and optimization strategies for specific hardware configurations, significantly enhancing program performance. This thesis demonstrates these ideas. First, we showcase the use of reinforcement learning in optimizing tensor computations with LoopTune. LoopTune optimizes tensor traversal order while using the ultra-fast lightweight code generator LoopNest to perform hardware-specific optimizations. With a novel graph-based representation and action space, LoopTune speeds up LoopNest by 3.2x, generating an order of magnitude faster code than TVM, 2.8x faster than MetaSchedule, and 1.08x faster than AutoTVM, consistently performing at the level of the hand-tuned library Numpy. Second, we pioneer the use of large language models (LLMs) in compiler optimization. Our model generates optimization in seconds, achieving a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Even more, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time. Third, we evaluate feedback-directed LLMs that use compiler feedback collected in inference time to improve generated code. We evaluate three feedback formats with various degrees of information, which all outperform the original model by 0.11%, 0.4%, and 0.53%. We further combine this approach with temperature-based sampling and iterative compilation. Sampling techniques show superior performance, reaching 98% of autotuner's performance over the compiler given the budget of 100 samples. Fourth, we present Priority Sampling, a simple deterministic LLM sampling technique that produces unique samples ordered by the model’s confidence. Priority Sampling outperforms Nucleus Sampling for any number of samples, reducing the code size further than the original model and achieving a 5% reduction over -Oz instead of 2.87%. Moreover, it outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Grubisic, Dejan. Optimizing Compiler Heuristics with Machine Learning. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116179	en_US
dc.identifier.uri	https://hdl.handle.net/1911/116179	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	machine learning	en_US
dc.subject	compilers	en_US
dc.subject	large language models	en_US
dc.subject	reinforcement learning	en_US
dc.title	Optimizing Compiler Heuristics with Machine Learning	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Computer Science	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: GRUBISIC-DOCUMENT-2024.pdf
Size:: 8.39 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.98 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations