Performance Analysis of Program Executions on Modern Parallel Architectures

dc.contributor.advisorMellor-Crummey, Johnen_US
dc.contributor.committeeMemberSarkar, Viveken_US
dc.contributor.committeeMemberVarman, Peteren_US
dc.contributor.committeeMemberBrowne, Jamesen_US
dc.creatorLiu, Xuen_US
dc.date.accessioned2016-01-07T22:18:30Zen_US
dc.date.available2016-01-07T22:18:30Zen_US
dc.date.created2014-12en_US
dc.date.issued2014-07-25en_US
dc.date.submittedDecember 2014en_US
dc.date.updated2016-01-07T22:18:31Zen_US
dc.description.abstractParallel architectures have become common in supercomputers, data centers, and mobile chips. Usually, parallel architectures have complex features: many hardware threads, deep memory hierarchies, and non-uniform memory access (NUMA). Program designs without careful consideration of these features may lead to poor performance on such architectures. First, multi-threaded programs can suffer from performance degradation caused by imbalanced workload, overuse of synchronization, and parallel overhead. Second, parallel programs may suffer from the long latency to the main memory. Third, in a NUMA system, memory accesses can be remote rather than local. Without a NUMA-aware design, a threaded program may have many costly remote accesses and imbalanced memory requests to NUMA domains. Performance tools can help us take full advantage of the power of parallel architectures by providing insight into where and why a program fails to obtain top performance. This dissertation addresses the difficulty of obtaining insights about performance bottlenecks in parallel programs using lightweight measurement techniques. This dissertation makes four contributions. First, it describes a novel performance analysis method for OpenMP programs, which can identify root causes of performance losses. Second, it presents a data-centric analysis method that associates performance metrics with data objects. This data-centric analysis can both identify both a program's problematic memory accesses and associated variables; this information can help an application developer optimize programs for better locality. Third, this dissertation discusses the development of a lightweight method that collects memory reuse distance to guide cache locality optimization. Finally, it describes implemented a lightweight profiling method that can help pinpoint performance losses in programs on NUMA architectures and provide guidance about how to transform the program to improve performance. To validate the utility of these methods, I implemented them in HPCToolkit, a state-of-the-art profiler developed at Rice University. I used the extended HPCToolkit to study several parallel programs. Guided by the performance insights provided by the new techniques introduced in this dissertation, I optimized all of these programs and was able to obtain non-trivial improvements to their performance. The measurement overhead incurred by these new analysis methods is very small in both runtime and memory.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLiu, Xu. "Performance Analysis of Program Executions on Modern Parallel Architectures." (2014) Diss., Rice University. <a href="https://hdl.handle.net/1911/87790">https://hdl.handle.net/1911/87790</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/87790en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectOpenMPen_US
dc.subjectdata localityen_US
dc.subjectperformanceen_US
dc.titlePerformance Analysis of Program Executions on Modern Parallel Architecturesen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LIU-DOCUMENT-2014.pdf
Size:
4.36 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: