Refining HPCToolkit for application performance analysis at exascale

dc.citation.firstpage612en_US
dc.citation.issueNumber6en_US
dc.citation.journalTitleThe International Journal of High Performance Computing Applicationsen_US
dc.citation.lastpage632en_US
dc.citation.volumeNumber38en_US
dc.contributor.authorAdhianto, Laksonoen_US
dc.contributor.authorAnderson, Jonathonen_US
dc.contributor.authorBarnett, Robert Matthewen_US
dc.contributor.authorGrbic, Draganaen_US
dc.contributor.authorIndic, Vladimiren_US
dc.contributor.authorKrentel, Marken_US
dc.contributor.authorLiu, Yumengen_US
dc.contributor.authorMilaković, Srđanen_US
dc.contributor.authorPhan, Wileamen_US
dc.contributor.authorMellor-Crummey, Johnen_US
dc.date.accessioned2024-11-20T15:52:03Zen_US
dc.date.available2024-11-20T15:52:03Zen_US
dc.date.issued2024en_US
dc.description.abstractAs part of the US Department of Energy’s Exascale Computing Project (ECP), Rice University has been refining its HPCToolkit performance tools to better support measurement and analysis of applications executing on exascale supercomputers. To efficiently collect performance measurements of GPU-accelerated applications, HPCToolkit employs novel non-blocking data structures to communicate performance measurements between tool threads and application threads. To attribute performance information in detail to source lines, loop nests, and inlined call chains, HPCToolkit performs parallel analysis of large CPU and GPU binaries involved in the execution of an exascale application to rapidly recover mappings between machine instructions and source code. To analyze terabytes of performance measurements gathered during executions at exascale, HPCToolkit employs distributed-memory parallelism, multithreading, sparse data structures, and out-of-core streaming analysis algorithms. To support interactive exploration of profiles up to terabytes in size, HPCToolkit’s hpcviewer graphical user interface uses out-of-core methods to visualize performance data. The result of these efforts is that HPCToolkit now supports collection, analysis, and presentation of profiles and traces of GPU-accelerated applications at exascale. These improvements have enabled HPCToolkit to efficiently measure, analyze and explore terabytes of performance data for executions using as many as 64K MPI ranks and 64K GPU tiles on ORNL’s Frontier supercomputer. HPCToolkit’s support for measurement and analysis of GPU-accelerated applications has been employed to study a collection of open-science applications developed as part of ECP. This paper reports on these experiences, which provided insight into opportunities for tuning applications, strengths and weaknesses of HPCToolkit itself, as well as unexpected behaviors in executions at exascale.en_US
dc.identifier.citationAdhianto, L., Anderson, J., Barnett, R. M., Grbic, D., Indic, V., Krentel, M., Liu, Y., Milaković, S., Phan, W., & Mellor-Crummey, J. (2024). Refining HPCToolkit for application performance analysis at exascale. The International Journal of High Performance Computing Applications, 38(6), 612–632. https://doi.org/10.1177/10943420241277839en_US
dc.identifier.digitaladhianto-et-al-2024-refining-hpctoolkit-for-application-performance-analysis-at-exascaleen_US
dc.identifier.doihttps://doi.org/10.1177/10943420241277839en_US
dc.identifier.urihttps://hdl.handle.net/1911/118052en_US
dc.language.isoengen_US
dc.publisherSageen_US
dc.rightsExcept where otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC) license. Permission to reuse, publish, or reproduce the work beyond the terms of the license or beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/en_US
dc.subject.keywordSoftware performanceen_US
dc.subject.keywordperformance analysisen_US
dc.subject.keywordsupercomputersen_US
dc.subject.keywordexascaleen_US
dc.subject.keywordgraphics processing unitsen_US
dc.titleRefining HPCToolkit for application performance analysis at exascaleen_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpublisher versionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
adhianto-et-al-2024-refining-hpctoolkit-for-application-performance-analysis-at-exascale.pdf
Size:
2.08 MB
Format:
Adobe Portable Document Format