Refining HPCToolkit for application performance analysis at exascale

Abstract

As part of the US Department of Energy’s Exascale Computing Project (ECP), Rice University has been refining its HPCToolkit performance tools to better support measurement and analysis of applications executing on exascale supercomputers. To efficiently collect performance measurements of GPU-accelerated applications, HPCToolkit employs novel non-blocking data structures to communicate performance measurements between tool threads and application threads. To attribute performance information in detail to source lines, loop nests, and inlined call chains, HPCToolkit performs parallel analysis of large CPU and GPU binaries involved in the execution of an exascale application to rapidly recover mappings between machine instructions and source code. To analyze terabytes of performance measurements gathered during executions at exascale, HPCToolkit employs distributed-memory parallelism, multithreading, sparse data structures, and out-of-core streaming analysis algorithms. To support interactive exploration of profiles up to terabytes in size, HPCToolkit’s hpcviewer graphical user interface uses out-of-core methods to visualize performance data. The result of these efforts is that HPCToolkit now supports collection, analysis, and presentation of profiles and traces of GPU-accelerated applications at exascale. These improvements have enabled HPCToolkit to efficiently measure, analyze and explore terabytes of performance data for executions using as many as 64K MPI ranks and 64K GPU tiles on ORNL’s Frontier supercomputer. HPCToolkit’s support for measurement and analysis of GPU-accelerated applications has been employed to study a collection of open-science applications developed as part of ECP. This paper reports on these experiences, which provided insight into opportunities for tuning applications, strengths and weaknesses of HPCToolkit itself, as well as unexpected behaviors in executions at exascale.

Description
Advisor
Degree
Type
Journal article
Keywords
Citation

Adhianto, L., Anderson, J., Barnett, R. M., Grbic, D., Indic, V., Krentel, M., Liu, Y., Milaković, S., Phan, W., & Mellor-Crummey, J. (2024). Refining HPCToolkit for application performance analysis at exascale. The International Journal of High Performance Computing Applications, 38(6), 612–632. https://doi.org/10.1177/10943420241277839

Has part(s)
Forms part of
Rights
Except where otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC) license. Permission to reuse, publish, or reproduce the work beyond the terms of the license or beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Citable link to this page