Techniques for Measurement, Analysis, and Optimization of HPC Communication Performance

dc.contributor.advisorMellor-Crummey, John M.en_US
dc.creatorTaffet, Philip Adamen_US
dc.date.accessioned2021-08-16T18:14:58Zen_US
dc.date.available2021-08-16T18:14:58Zen_US
dc.date.created2021-08en_US
dc.date.issued2021-07-21en_US
dc.date.submittedAugust 2021en_US
dc.date.updated2021-08-16T18:14:58Zen_US
dc.description.abstractInter-node communication is a critical component of tightly coupled applications running on parallel high performance computing systems. Surveys of high performance computing benchmarks and applications show that most applications spend at least 20% of their execution time communicating, and some spend more than 50%. Thus, inter-node communication performance is important to the overall performance of parallel applications. Furthermore, as the scale of parallelism increases, communicating efficiently becomes more important and typically more difficult. Application developers often cannot address communication performance issues on their own, whether because of a lack of useful diagnostic information, or because they stem from system-level issues such as poor routing. This dissertation describes several techniques for measuring, analyzing, and optimizing communication performance for parallel applications running on a supercomputer with a fat tree interconnect, all of which can aid in improving communication performance of applications. First, I describe a sampling-based monitoring technique that uses a small amount of performance-related data in each packet to reconstruct quantitative estimates of traffic and congestion correlated with both application contexts and individual links. Using this information, it can distinguish between problems with an application's communication pattern, its mapping onto a parallel system, and outside interference. Second, I propose an approach for generating optimized, traffic-aware routes on a statically routed network. The core of this approach is a combination of linear programming formulations for the optimal static routing problem. Third, I propose a technique for reconstructing application traffic patterns via compressed sensing from switch counters and other system-level information. The second and third contributions, combined to form a system called CoGARFrSN, use measures of communication traffic to produce better static routes that reduce congestion, which can be used effectively to turn a statically routed network into a coarse-grained adaptively routed network. Experiments with a network simulator show that CoGARFrSN routes often result in a 4-7x speedup over the traffic-oblivious static routing strategy typically used in fat trees for several communication motifs, and CoGARFrSN routes sometimes even perform significantly better than fine-grained hardware adaptive routing.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationTaffet, Philip Adam. "Techniques for Measurement, Analysis, and Optimization of HPC Communication Performance." (2021) Diss., Rice University. <a href="https://hdl.handle.net/1911/111185">https://hdl.handle.net/1911/111185</a>.en_US
dc.identifier.doihttps://doi.org/10.25611/FMS4-TY78en_US
dc.identifier.urihttps://hdl.handle.net/1911/111185en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectHPCen_US
dc.subjecthigh performance computingen_US
dc.subjectnetworkingen_US
dc.subjectInfiniBanden_US
dc.subjectOmni-Pathen_US
dc.subjectstatic routingen_US
dc.subjectnetworksen_US
dc.subjectcompressed sensingen_US
dc.subjectcompressive sensingen_US
dc.subjectreservoir samplingen_US
dc.subjectinteger linear programmingen_US
dc.subjectoptimizationen_US
dc.titleTechniques for Measurement, Analysis, and Optimization of HPC Communication Performanceen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TAFFET-DOCUMENT-2021.pdf
Size:
6.07 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.61 KB
Format:
Plain Text
Description: