Efficient optimization of memory accesses in parallel programs

dc.contributor.advisorSarkar, Vivek
dc.creatorBarik, Rajkishore
dc.date.accessioned2011-07-25T02:05:49Z
dc.date.available2011-07-25T02:05:49Z
dc.date.issued2010
dc.description.abstractThe power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges. The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models. The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible. Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor. With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures.
dc.format.mimetypeapplication/pdf
dc.identifier.callnoTHESIS COMP. SCI. 2010 BARIK
dc.identifier.citationBarik, Rajkishore. "Efficient optimization of memory accesses in parallel programs." (2010) Diss., Rice University. <a href="https://hdl.handle.net/1911/62060">https://hdl.handle.net/1911/62060</a>.
dc.identifier.urihttps://hdl.handle.net/1911/62060
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectComputer science
dc.subjectApplied sciences
dc.titleEfficient optimization of memory accesses in parallel programs
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3421163.PDF
Size:
5.85 MB
Format:
Adobe Portable Document Format