Code Generation for Extreme Scale Parallel Systems

Srinivasa Murthy, Karthik

Code Generation for Extreme Scale Parallel Systems

Files

SRINIVASAMURTHY-DOCUMENT-2017.pdf (7.36 MB)

Date

2017-03-08

Authors

Srinivasa Murthy, Karthik

Abstract

Power consumption and fabrication limitations are increasingly playing significant roles in the design of extreme scale parallel systems. These factors are influencing system designers to support higher on-node computing capability via throughput-optimized processors instead of latency-optimized processors. However, the inter- and intra-processor communication capabilities on such systems are not increasing at the same rate as the on-node computing capability. Consequently, achieving high performance requires careful orchestration of both single- and multiprocessor parallelism. This thesis shows that compiler technology and expressive programming model constructs can help applications more effectively exploit both forms of parallelism.

Compilers play an important role in harnessing short vector parallelism supported by cores in modern processors. Over last ten years, vector widths have increased dramatically from the 64-bit vectors supported by Intel's Pentium MMX processor to the 512-bit vectors supported by Intel's Knights Corner processor. However, the vectorization capabilities of state-of-the-art compilers are still immature, failing in the presence of complex control flow and data dependencies. This thesis presents compiler transformations that enable efficient vector parallelism in the presence of common kinds of complex dependencies.

To enable efficient multiprocessor parallelism, this thesis develops compiler technology to support sophisticated algorithms that minimize interprocessor communication. The class of .5D communication-avoiding algorithms was developed to address the inter-processor communication bottleneck. Mapping these algorithms to complex architectures efficiently is tedious for even expert programmers. To address this issue, this thesis presents the Maunam compiler, which generates efficient parallel code from a high-level, global-view sketch of a .5D algorithm that is expressed using symbolic data sizes and numbers of processors.

To mitigate the cost of communication for multiprocessor parallelism, this thesis develops a novel compiler transformation to overlap communication with computation for systolic computations. Additionally, to aid effective management of the completion of non-blocking communication, this thesis presents two synchronization constructs, cofence and distributed phasers.

Advisor

Mellor-Crummey, John M

Degree

Doctor of Philosophy

Type

Thesis

Keywords

.5D algorithms, compiler, polyhedral model, cofence, phasers, modulo operations, collectives, code generation, vectorization

Citation

Srinivasa Murthy, Karthik. "Code Generation for Extreme Scale Parallel Systems." (2017) Diss., Rice University. https://hdl.handle.net/1911/96060.

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/96060

Collections

Rice University Theses and Dissertations

Full item page