Autotuning Memory-intensive Software for Node Architectures

Wei, Lai

Autotuning Memory-intensive Software for Node Architectures

dc.contributor.advisor	Mellor-Crummey, John
dc.contributor.committeeMember	Cooper, Keith
dc.contributor.committeeMember	Sarkar, Vivek
dc.creator	Wei, Lai
dc.date.accessioned	2016-02-05T21:49:56Z
dc.date.available	2016-02-05T21:49:56Z
dc.date.created	2015-05
dc.date.issued	2015-05-13
dc.date.submitted	May 2015
dc.date.updated	2016-02-05T21:49:56Z
dc.description.abstract	Today, scientific computing plays an important role in scientific research. People build supercomputers to support the computational needs of large-scale scientific applications. Achieving high performance on today's supercomputers is difficult, in large part due to the complexity of the node architectures, which include wide-issue instruction-level parallelism, SIMD operations, multiple cores, multiple threads per core, and a deep memory hierarchy. In addition, growth of compute performance has outpaced the growth of memory bandwidth, making memory bandwidth a scarce resource. People have proposed various optimization methods, including tiling and prefetching, to make better usage of the memory hierarchy. However, due to architectural differences, code hand-tuned for one architecture is not necessarily efficient for others. For that reason, autotuning is often used to tailor high-performance code for different architectures. Common practice is to develop a parametric code generator that generates code according to different optimization parameters and then picks the best among various implementation alternatives for a given architecture. In this thesis, we use tensor transposition, a generalization of matrix transposition, as a motivating example to study the problem of autotuning memory-intensive codes for complex memory hierarchies. We developed a framework to produce optimized parallel tensor transposition code for node architectures. This framework has two components: a rule-based code generation and transformation system that generates code according to specified optimization parameters, and an autotuner that uses static analysis along with empirical autotuning to pick the best implementation scheme. In this work, we studied how to prune the autotuning search space and perform run-time code selection using hardware performance counters. Despite the complex memory access patterns of tensor transposition, experiments on two very different architectures show that our approach achieves more than 80% of the bandwidth of optimized memory copies when transposing most tensors. Our results show that autotuning is the key to achieving peak application performance across different node architectures for memory-intensive codes.
dc.format.mimetype	application/pdf
dc.identifier.citation	Wei, Lai. "Autotuning Memory-intensive Software for Node Architectures." (2015) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/88422">https://hdl.handle.net/1911/88422</a>.
dc.identifier.uri	https://hdl.handle.net/1911/88422
dc.language.iso	eng
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subject	autotuning
dc.subject	memory-intensive software
dc.subject	tensor transposition
dc.subject	hardware performance counters
dc.subject	memory hierachy
dc.title	Autotuning Memory-intensive Software for Node Architectures
dc.type	Thesis
dc.type.material	Text
thesis.degree.department	Computer Science
thesis.degree.discipline	Engineering
thesis.degree.grantor	Rice University
thesis.degree.level	Masters
thesis.degree.name	Master of Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: WEI-DOCUMENT-2015.pdf
Size:: 5.87 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.6 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Electronic Theses and Dissertations