Accelerating High-Order Stencils on GPUs

Sai, Ryuichi; Mellor-Crummey, John; Meng, Xiaozhu; Araya-Polo, Mauricio; Meng, Jie

Accelerating High-Order Stencils on GPUs

Files

PMBS_Wiley_Special_Edition.pdf (818.13 KB)

Date

2020

Authors

Publisher

IEEE

Abstract

While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of the techniques work well for high-order stencils, such as those used for seismic imaging. In this paper, we study practical seismic imaging computations on GPUs using high-order stencils on large domains with meaningful boundary conditions. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA along with code to apply the boundary conditions. We evaluated our stencil code shapes, memory hierarchy usage, data-fetching patterns, and other performance attributes. We conducted an empirical evaluation of these stencils using several mature and emerging tools and discuss our quantitative findings. Some of our implementations achieved twice the performance of a proprietary code developed in C and mapped to GPUs using OpenACC. Additionally, several of our implementations have excellent performance portability.

Type

Journal article

Citation

Sai, Ryuichi, Mellor-Crummey, John, Meng, Xiaozhu, et al.. "Accelerating High-Order Stencils on GPUs." 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), (2020) IEEE: 86-108. https://doi.org/10.1109/PMBS51919.2020.00014.

Published Version

https://doi.org/10.1109/PMBS51919.2020.00014

Rights

This is an author's post-print. The published article is copyrighted by IEEE.

Citable link to this page

https://hdl.handle.net/1911/113182

Collections

Faculty Publications
Computer Science Publications

Full item page