Automated Diagnosis of Scalability Losses in Parallel Applications

dc.contributor.advisorMellor-Crummey, John
dc.creatorWei, Lai
dc.date.accessioned2020-02-26T13:40:35Z
dc.date.available2020-02-26T13:40:35Z
dc.date.created2018-08
dc.date.issued2020-02-25
dc.date.submittedAugust 2018
dc.date.updated2020-02-26T13:40:35Z
dc.description.abstractEach generation of supercomputers is more powerful than the last in an attempt to keep up with the growing ambition of scientific inquiry. Despite improvements in computational power, however, performance of many parallel applications has failed to scale. Many factors degrade the parallel performance of applications. The need to understand application behaviors and pinpoint causes of inefficiency has led to the development of a broad array of tools for measuring and analyzing application performance. Those performance analysis tools generally focus on collecting measurements, attributing them to program source code, and presenting them; responsibility for analysis and interpretation of performance measurement data falls to application developers. Profiles generated by performance tools can usually identify the presence of scalability losses while time series data are generally necessary to pinpoint the root causes of such losses. However, manual analysis of time series data can be difficult in executions with a large number of processes, long running times, and deep call chains. To address this problem, we developed an automated framework that analyzes time series of call path samples to present users with performance diagnosis of parallel executions. Our automated framework incurs much lower overhead in time and space compared to prior tools that analyze performance using instrumentation-based traces. The framework's automated diagnosis indicates the symptoms, severity, and causes of scalability losses found in a parallel execution. To support a broad array of parallel applications, our automated analysis is applicable to both SPMD and MPMD in both flat and hierarchical parallel models. We demonstrate the effectiveness of our framework by applying it to time-series measurements of three scientific codes.
dc.format.mimetypeapplication/pdf
dc.identifier.citationWei, Lai. "Automated Diagnosis of Scalability Losses in Parallel Applications." (2020) Diss., Rice University. <a href="https://hdl.handle.net/1911/108077">https://hdl.handle.net/1911/108077</a>.
dc.identifier.urihttps://hdl.handle.net/1911/108077
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectperformance
dc.subjectautomated diagnosis
dc.subjectscalability losses
dc.subjectsample-based time series data
dc.titleAutomated Diagnosis of Scalability Losses in Parallel Applications
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WEI-DOCUMENT-2018.pdf
Size:
4.67 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: