Locality Transformations of Computation and Data for Portable Performance

dc.contributor.advisorSarkar, Viveken_US
dc.contributor.committeeMemberMellor-Crummey, Johnen_US
dc.contributor.committeeMemberWarburton, Timothyen_US
dc.creatorSharma, Kamal Gopalen_US
dc.date.accessioned2016-01-27T17:21:12Zen_US
dc.date.available2016-01-27T17:21:12Zen_US
dc.date.created2015-05en_US
dc.date.issued2014-08-21en_US
dc.date.submittedMay 2015en_US
dc.date.updated2016-01-27T17:21:12Zen_US
dc.description.abstractRecently, multi-cores chips have become omnipresent in computer systems ranging from high-end servers to mobile phones. A variety of multi-core architectures have been developed which vary in the number of cores on a single die, cache hierarchies within a chip and interconnect across chips. This diversity of architectures is likely to continue for the foreseeable future. With these architectural variations, performance tuning of an application becomes a major challenge for the programmer. Code optimizations developed for one architecture may have a different impact, even negative, across other architectures due to the differences in tuning parameters. This problem is compounded when scaling from a single node to multiple nodes in a cluster. Our thesis is that significant performance benefits can be obtained from locality transformations of computation and data that require minimal programmer effort. We establish this thesis by addressing three specific problems — tile size selection, data layout optimization and automatic selection of distribution functions. Tile size selection and data layout optimization improve intra-node performance, whereas automatic selection of distribution functions also enhances inter-node performance. Loop tiling is a well-known locality optimization technique that can increase cache reuse at different levels of a memory hierarchy. Choosing tile sizes for different loop nests is a non-trivial task for a programmer due to the large parameter search space and different architectural parameters for different platforms. In this dissertation, we present analytical bounds for tile size selection. Our approach uses different architectural parameters such as cache size and TLB size to limit the tile size search space, and also includes an automated algorithm to select optimized tile sizes. Another important locality optimization is data layout transformation, which improves cache performance by restructuring program data. An important challenge for a programmer is to select an efficient data layout for a given application and target platform. In the past, selecting a data layout has been limited by the cost of rewriting applications for different layouts and by the effort required to find an optimized layout for a given platform. In this work, we present an automated tool for implementing different layouts for a given program. Using our approach, a programmer can experiment with different layouts across varying platforms for efficient performance. We also provide an automatic algorithm to select an optimized data layout for a given program and its representative executions. When mapping an application onto multiple nodes, one challenge for scalable performance is how to distribute computation and data across the nodes. In this work, we introduce an algorithm to automatically select a task/data distribution function for a set of nodes. We build this solution on Intel CnC’s distributed runtime system. Overall, our approaches for the three problems provide automated methods for locality optimization that enable portable performance while requiring minimal programmer effort.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationSharma, Kamal Gopal. "Locality Transformations of Computation and Data for Portable Performance." (2014) Diss., Rice University. <a href="https://hdl.handle.net/1911/88159">https://hdl.handle.net/1911/88159</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/88159en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectLocality Transformationsen_US
dc.subjectCache Optimizationen_US
dc.subjectTile size selectionen_US
dc.subjectData Layout Optimizationen_US
dc.subjectPerformanceen_US
dc.subjectDistributed Function Selectionen_US
dc.titleLocality Transformations of Computation and Data for Portable Performanceen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SHARMA-DOCUMENT-2015.pdf
Size:
6.28 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: