Optimizing Compiler Heuristics with Machine Learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Compiler technology is crucial for enhancing the performance and efficiency of modern software. The complexity of novel computer architectures, the ever-evolving software landscape, and the ever-growing scale of computation have made manual optimization techniques increasingly difficult and time-consuming. To address this, machine learning (ML) can recognize intricate patterns and automatically tailor code generation and optimization strategies for specific hardware configurations, significantly enhancing program performance. This thesis demonstrates these ideas.
First, we showcase the use of reinforcement learning in optimizing tensor computations with LoopTune. LoopTune optimizes tensor traversal order while using the ultra-fast lightweight code generator LoopNest to perform hardware-specific optimizations. With a novel graph-based representation and action space, LoopTune speeds up LoopNest by 3.2x, generating an order of magnitude faster code than TVM, 2.8x faster than MetaSchedule, and 1.08x faster than AutoTVM, consistently performing at the level of the hand-tuned library Numpy.
Second, we pioneer the use of large language models (LLMs) in compiler optimization. Our model generates optimization in seconds, achieving a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Even more, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time.
Third, we evaluate feedback-directed LLMs that use compiler feedback collected in inference time to improve generated code. We evaluate three feedback formats with various degrees of information, which all outperform the original model by 0.11%, 0.4%, and 0.53%. We further combine this approach with temperature-based sampling and iterative compilation. Sampling techniques show superior performance, reaching 98% of autotuner's performance over the compiler given the budget of 100 samples.
Fourth, we present Priority Sampling, a simple deterministic LLM sampling technique that produces unique samples ordered by the model’s confidence. Priority Sampling outperforms Nucleus Sampling for any number of samples, reducing the code size further than the original model and achieving a 5% reduction over -Oz instead of 2.87%. Moreover, it outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.
Description
Advisor
Degree
Type
Keywords
Citation
Grubisic, Dejan. Optimizing Compiler Heuristics with Machine Learning. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/116179