R-3 Repository :: Browsing by Author "Araya-Polo, Mauricio"

Browsing by Author "Araya-Polo, Mauricio"

Now showing 1 - 2 of 2

Accelerating High-Order Stencils on GPUs
(IEEE, 2020) Sai, Ryuichi; Mellor-Crummey, John; Meng, Xiaozhu; Araya-Polo, Mauricio; Meng, Jie
While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of the techniques work well for high-order stencils, such as those used for seismic imaging. In this paper, we study practical seismic imaging computations on GPUs using high-order stencils on large domains with meaningful boundary conditions. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA along with code to apply the boundary conditions. We evaluated our stencil code shapes, memory hierarchy usage, data-fetching patterns, and other performance attributes. We conducted an empirical evaluation of these stencils using several mature and emerging tools and discuss our quantitative findings. Some of our implementations achieved twice the performance of a proprietary code developed in C and mapped to GPUs using OpenACC. Additionally, several of our implementations have excellent performance portability.
Performance Analysis and Optimization of a Hybrid Seismic Imaging Application
(Elsevier, 2016) Paul, Sri Raj; Araya-Polo, Mauricio; Mellor-Crummey, John; Hohl, Detlef
Applications to process seismic data are computationally expensive and, therefore, employ scalable parallel systems to produce timely results. Here we describe our experiences of using performance analysis tools to gain insight into an MPI+OpenMP code developed by Shell that performs Reverse Time Migration on a cluster to produce models of the subsurface. Tuning MPI+OpenMP programs for modern platforms is difficult, and, therefore, assistance is required from performance analysis tools. These tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores. By applying insights obtained from Rice University's HPCToolkit and hardware performance counters, we were able to improve the performance of Shell's prototype distributed-memory Reverse Time Migration code by roughly 30 percent.

Browsing by Author "Araya-Polo, Mauricio"

Results Per Page

Sort Options