Efficient optimization of memory accesses in parallel programs

Barik, Rajkishore

Efficient optimization of memory accesses in parallel programs

dc.contributor.advisor	Sarkar, Vivek	en_US
dc.creator	Barik, Rajkishore	en_US
dc.date.accessioned	2011-07-25T02:05:49Z	en_US
dc.date.available	2011-07-25T02:05:49Z	en_US
dc.date.issued	2010	en_US
dc.description.abstract	The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges. The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models. The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible. Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor. With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.callno	THESIS COMP. SCI. 2010 BARIK	en_US
dc.identifier.citation	Barik, Rajkishore. "Efficient optimization of memory accesses in parallel programs." (2010) Diss., Rice University. <a href="https://hdl.handle.net/1911/62060">https://hdl.handle.net/1911/62060</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/62060	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Computer science	en_US
dc.subject	Applied sciences	en_US
dc.title	Efficient optimization of memory accesses in parallel programs	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Computer Science	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3421163.PDF
Size:: 5.85 MB
Format:: Adobe Portable Document Format

Download

Collections

Rice University Theses and Dissertations
Center for Research Computing