Compiling for Software Distributed-Shared Memory Systems

dc.contributor.authorZhang, Kai
dc.date.accessioned2017-08-02T22:02:47Z
dc.date.available2017-08-02T22:02:47Z
dc.date.issued2000-04-03
dc.date.noteApril 3, 2000
dc.description.abstractIn this thesis, we explore the use of software distributed shared memory (SDSM) as a target communication layer for parallelizing compilers. ForSDSM to be effective for this purpose it must efficiently support both regular and irregular communication patterns. Previous studies have demonstrated techniques that enable SDSM to achieve performance that is competitive with hand-coded message passing for irregular applications. Here, we explore how to effectively exploit compiler-derived knowledge of sharing and communication patterns for regular access patterns to improve their performance on SDSM systems. We introduce two novel optimization techniques: compiler-restricted consistency which reduces the cost of false sharing, and compiler-managed communication buffers which, when used together with compiler-restricted consistency, reduce the cost of fragmentation. We focus on regular applications with wavefront computation and tightly-coupled sharing due to carried data dependence. Previous studies of regular applications all focus on loosely-coupled parallelism for which it is easier to achieve good performance. We describe point-to-point synchronization primitives we have developed that facilitate the parallelization of this type of applications on SDSM. Along with other types of compiler-assisted SDSM optimizations such as compiler-controlled eager update, our integrated compiler and run-time support provides speedups for wavefront computations on SDSM that rival those achieved previously only for loosely synchronous style applications. For example, we achieve a speed up of 11 out of 16 for SOR benchmark—a tightly-coupled computation based on wavefront, of a problem size of 4Kx4K. which compares favorably with the 14 out of 16 speed up which we obtain for Red Black SOR—a loosely-coupled computation, of the same problem size under the same hardware and software environment. With the NAS-BT application benchmark using the Class A problem size, we achieved an impressive boost of speedup, from 4 out of 16, to 10 out of 16, on SDSM as a result of the compiler and runtime optimizations we described here.
dc.format.extent80 pp
dc.identifier.citationZhang, Kai. "Compiling for Software Distributed-Shared Memory Systems." (2000) https://hdl.handle.net/1911/96274.
dc.identifier.digitalTR00-356
dc.identifier.urihttps://hdl.handle.net/1911/96274
dc.language.isoeng
dc.rightsYou are granted permission for the noncommercial reproduction, distribution, display, and performance of this technical report in any format, but this permission is only for a period of forty-five (45) days from the most recent time that you verified that this technical report is still available from the Computer Science Department of Rice University under terms that include this permission. All other rights are reserved by the author(s).
dc.titleCompiling for Software Distributed-Shared Memory Systems
dc.typeTechnical report
dc.type.dcmiText
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TR00-356.pdf
Size:
2.9 MB
Format:
Adobe Portable Document Format