Mellor-Crummey, John M.2021-08-162021-08-162021-082021-07-21August 202Taffet, Philip Adam. "Techniques for Measurement, Analysis, and Optimization of HPC Communication Performance." (2021) Diss., Rice University. <a href="https://hdl.handle.net/1911/111185">https://hdl.handle.net/1911/111185</a>.https://hdl.handle.net/1911/111185Inter-node communication is a critical component of tightly coupled applications running on parallel high performance computing systems. Surveys of high performance computing benchmarks and applications show that most applications spend at least 20% of their execution time communicating, and some spend more than 50%. Thus, inter-node communication performance is important to the overall performance of parallel applications. Furthermore, as the scale of parallelism increases, communicating efficiently becomes more important and typically more difficult. Application developers often cannot address communication performance issues on their own, whether because of a lack of useful diagnostic information, or because they stem from system-level issues such as poor routing. This dissertation describes several techniques for measuring, analyzing, and optimizing communication performance for parallel applications running on a supercomputer with a fat tree interconnect, all of which can aid in improving communication performance of applications. First, I describe a sampling-based monitoring technique that uses a small amount of performance-related data in each packet to reconstruct quantitative estimates of traffic and congestion correlated with both application contexts and individual links. Using this information, it can distinguish between problems with an application's communication pattern, its mapping onto a parallel system, and outside interference. Second, I propose an approach for generating optimized, traffic-aware routes on a statically routed network. The core of this approach is a combination of linear programming formulations for the optimal static routing problem. Third, I propose a technique for reconstructing application traffic patterns via compressed sensing from switch counters and other system-level information. The second and third contributions, combined to form a system called CoGARFrSN, use measures of communication traffic to produce better static routes that reduce congestion, which can be used effectively to turn a statically routed network into a coarse-grained adaptively routed network. Experiments with a network simulator show that CoGARFrSN routes often result in a 4-7x speedup over the traffic-oblivious static routing strategy typically used in fat trees for several communication motifs, and CoGARFrSN routes sometimes even perform significantly better than fine-grained hardware adaptive routing.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.HPChigh performance computingnetworkingInfiniBandOmni-Pathstatic routingnetworkscompressed sensingcompressive sensingreservoir samplinginteger linear programmingoptimizationTechniques for Measurement, Analysis, and Optimization of HPC Communication PerformanceThesis2021-08-16https://doi.org/10.25611/FMS4-TY78