Browsing by Author "Paul, Sri Raj"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Mapping High Level Parallel Programming Models to Asynchronous Many-Task (AMT) Runtimes(2018-12-20) Paul, Sri Raj; Sarkar, VivekAsynchronous Many-Task (AMT) runtimes have recently been proposed as a promising software foundation for managing the increasing complexity of node architectures in current and future extreme-scale computing systems because of their ability to express fine-grained parallelism, to decouple computation and data from underlying machine resources, to support resilience, and to deliver scalable performance. The Open Community Runtime (OCR) is a community-led effort to explore AMT runtime principles that can support a broad range of higher-level programming models. The Habanero C/C++ library (HClib) is a library-based AMT runtime and programming interface, which focuses on lightweight task creation/termination and flexible synchronization. Unlike other AMT runtimes, both OCR and HClib include first-class support for event-driven task execution, which can help with hiding communication latencies and with reducing the number of blocking operations performed. In this thesis, we focus on the problem of mapping high-level parallel programming models to AMT runtimes. As an exemplar of modern Partitioned Global Address Space (PGAS) parallel programming models, we show how Chapel programs can be efficiently mapped on to OCR and HClib, and also how Legion, a data-centric parallel programming model, can be mapped on to OCR. Next, we show how PGAS and event-driven execution models can be synergistically integrated in a unique combination of server-side JavaScript and HClib, yielding new levels of programming productivity for high performance computing. Finally, we show how the promise of supporting resilience in AMT runtimes can be realized through programming model extensions to HClib. All these contributions are accompanied by performance evaluations of prototype implementations. Our results show that AMT runtimes can support high-level parallel programming models with comparable or improved performance relative to existing runtimes, while also providing the potential for improved resilience.Item Performance Analysis and Optimization of a Hybrid Distributed Reverse Time Migration Application(2016-02-12) Paul, Sri Raj; Mellor-Crummey, JohnTo fully exploit emerging processor architectures, programs will need to employ threaded parallelism within a node and message passing across nodes. Today, MPI+OpenMP is the preferred programming model for this task. However, tuning MPI+OpenMP programs for clusters is difficult. Performance tools can help users identify bottlenecks and uncover opportunities for improvement. Applications to analyze seismic data employ scalable parallel systems to produce timely results. This thesis describes our experiences of applying performance tools to gain insight into an MPI+OpenMP code that performs Reverse Time Migration (RTM) to analyze seismic data and also assess the capabilities of available tools for analyzing the performance of a sophisticated application that employ both message-passing and threaded parallelism. The tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores. By applying insights obtained from Rice University's HPCToolkit and hardware performance counters, we were able to improve the performance of the RTM code by roughly 30 percent.Item Performance Analysis and Optimization of a Hybrid Seismic Imaging Application(Elsevier, 2016) Paul, Sri Raj; Araya-Polo, Mauricio; Mellor-Crummey, John; Hohl, DetlefApplications to process seismic data are computationally expensive and, therefore, employ scalable parallel systems to produce timely results. Here we describe our experiences of using performance analysis tools to gain insight into an MPI+OpenMP code developed by Shell that performs Reverse Time Migration on a cluster to produce models of the subsurface. Tuning MPI+OpenMP programs for modern platforms is difficult, and, therefore, assistance is required from performance analysis tools. These tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores. By applying insights obtained from Rice University's HPCToolkit and hardware performance counters, we were able to improve the performance of Shell's prototype distributed-memory Reverse Time Migration code by roughly 30 percent.