Software Support for Efficient Use of Modern Computer Architectures

dc.contributor.advisorMellor-Crummey, Johnen_US
dc.contributor.committeeMemberSarkar, Viveken_US
dc.contributor.committeeMemberVarman, Peteren_US
dc.contributor.committeeMemberIancu, Costinen_US
dc.creatorChabbi, Milind Mohanen_US
dc.date.accessioned2016-01-06T20:58:03Zen_US
dc.date.available2016-01-06T20:58:03Zen_US
dc.date.created2015-12en_US
dc.date.issued2015-08-14en_US
dc.date.submittedDecember 2015en_US
dc.date.updated2016-01-06T20:58:03Zen_US
dc.description.abstractParallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memory hierarchies make modern architectures difficult to program efficiently. Achieving top performance on supercomputers is difficult due to complex hardware, software, and their interactions. Production software systems fail to achieve top performance on modern architectures broadly due to three main causes: resource idleness, parallel overhead, and data movement overhead. This dissertation presents novel and effective performance analysis tools, adaptive runtime systems, and architecture-aware algorithms to understand and address these problems. Many future high performance systems will employ traditional multicore CPUs augmented with accelerators such as GPUs. One of the biggest concerns for accelerated systems is how to make best use of both CPU and GPU resources. Resource idleness arises in a parallel program due to insufficient parallelism and load imbalance among other causes. To assess systemic resource idleness arising in GPU-accelerated architectures, we developed efficient profiling and tracing capabilities. We introduce CPU-GPU blame shifting--a novel technique to pinpoint and quantify the causes of resource idleness in GPU-accelerated architectures. Parallel overheads arise due to synchronization constructs such as barriers and locks used in parallel programs. We developed a new technique to identify and eliminate redundant barriers at runtime in Partitioned Global Address Space programs. In addition, we developed a set of novel mutual exclusion algorithms that exploit locality in the memory hierarchy to improve performance on Non-Uniform Memory Access architectures. In modern architectures, inefficient or unnecessary memory accesses can severely degrade program performance. To pinpoint and quantify wasteful memory operations, we developed a fine-grain execution-monitoring framework. We extended this framework and demonstrated the feasibility of attributing fine-grain execution metrics to source and data in their contexts for long running programs--a task previously thought to be infeasible. Together the solutions described in this dissertation were employed to gain insights into the performance of a collection of important programs, both parallel and serial. The insights we gained enabled us to improve the performance of many of these programs by a significant margin. Software for future systems will benefit from the techniques described in this dissertation.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationChabbi, Milind Mohan. "Software Support for Efficient Use of Modern Computer Architectures." (2015) Diss., Rice University. <a href="https://hdl.handle.net/1911/87730">https://hdl.handle.net/1911/87730</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/87730en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectperformance analysisen_US
dc.subjectresource idlenessen_US
dc.subjectblame shiftingen_US
dc.subjectheterogeneous architecturesen_US
dc.subjectGPUen_US
dc.subjectdynamic analysisen_US
dc.subjectbarrier elisionen_US
dc.subjectPGASen_US
dc.subjectNWChemen_US
dc.subjectmutual exclusionen_US
dc.subjectlocksen_US
dc.subjectMCS locken_US
dc.subjecthierarchical MCS locken_US
dc.subjectHMCS locken_US
dc.subjectPower7en_US
dc.subjectPower8en_US
dc.subjectSGI UV 1000en_US
dc.subjectAdaptive HMCSen_US
dc.subjectAHMCSen_US
dc.subjectfast pathen_US
dc.subjectcontention managamenten_US
dc.subjecthysteresisen_US
dc.subjecthardware transactional memoryen_US
dc.subjectDeadSpyen_US
dc.subjectdead writesen_US
dc.subjectPinen_US
dc.subjectfine-grained monitoringen_US
dc.subjectCCTLiben_US
dc.subjectfine-grained execution monitoringen_US
dc.titleSoftware Support for Efficient Use of Modern Computer Architecturesen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CHABBI-DOCUMENT-2015.pdf
Size:
6.32 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: