Mellor-Crummey, John M2018-11-262018-11-262018-052018-04-30May 2018Taffet, Philip Adam. "Understanding Congestion in High Performance Interconnection Networks Using Sampling." (2018) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/103413">https://hdl.handle.net/1911/103413</a>.https://hdl.handle.net/1911/103413The computational needs of many applications outstrip the capabilities of a single compute node. Communication is necessary to employ multiple nodes, but slow communication often limits application performance on multiple nodes. To improve communication performance, developers need tools that enable them to understand how their application’s communication patterns interact with the network, especially when those interactions result in congestion. Since communication performance is difficult to reason about analytically and simulation is costly, measurement-based approaches are needed. This thesis describes a new sampling-based technique to collect information about the path a packet takes and congestion it encounters. Experiments with simulations show that this strategy can distinguish problems with an application's communication patterns, its mapping onto a parallel system, and outside interference. We describe a variant of this scheme that requires only 5-6 bits of information in a monitored packet, making it practical for use in next-generation networks.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.network congestionreservoir samplingprobabilistichigh performance computingnetwork simulationInfiniBandinterconnection networkscommunication performanceOmni-PathMPIUnderstanding Congestion in High Performance Interconnection Networks Using SamplingThesis2018-11-26https://doi.org/10.25611/0pec-yd57