Wave Equation Based Stencil Optimizations on a Multi-core CPU
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Wave propagation stencil kernels are engines of seismic imaging algo- rithms. These kernels are both compute- and memory-intensive. This work targets improving the performance of wave equation based stencil code parallelized by OpenMP on a multi-core CPU. To achieve this goal, we explored two techniques: improving vectorization by using hardware SIMD technology, and reducing memory traffic to mitigate the bottle- neck caused by limited memory bandwidth. We show that with loop interchange, memory alignment, and compiler hints, both icc and gcc compilers can provide fully-vectorized stencil code of any order with per- formance comparable to that of SIMD intrinsic code. To reduce cache misses, we present three methods in the context of OpenMP paralleliza- tion: rearranging loop structure, blocking thread accesses, and temporal loop blocking. Our results demonstrate that fully-vectorized high-order stencil code will be about 2X faster if implemented with either of the first two methods, and fully-vectorized low-order stencil code will be about 1.2X faster if implemented with the combination of the last two methods. Our final best-performing code achieves 20%∼30% of peak GFLOPs/sec, depending on stencil order and compiler.
Description
Advisor
Degree
Type
Keywords
Citation
Zhou, Muhong. "Wave Equation Based Stencil Optimizations on a Multi-core CPU." (2014) Master’s Thesis, Rice University. https://hdl.handle.net/1911/88412.