Symes, William W.2016-02-052016-02-052014-122014-11-04December 2Zhou, Muhong. "Wave Equation Based Stencil Optimizations on a Multi-core CPU." (2014) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/88412">https://hdl.handle.net/1911/88412</a>.https://hdl.handle.net/1911/88412Wave propagation stencil kernels are engines of seismic imaging algo- rithms. These kernels are both compute- and memory-intensive. This work targets improving the performance of wave equation based stencil code parallelized by OpenMP on a multi-core CPU. To achieve this goal, we explored two techniques: improving vectorization by using hardware SIMD technology, and reducing memory traffic to mitigate the bottle- neck caused by limited memory bandwidth. We show that with loop interchange, memory alignment, and compiler hints, both icc and gcc compilers can provide fully-vectorized stencil code of any order with per- formance comparable to that of SIMD intrinsic code. To reduce cache misses, we present three methods in the context of OpenMP paralleliza- tion: rearranging loop structure, blocking thread accesses, and temporal loop blocking. Our results demonstrate that fully-vectorized high-order stencil code will be about 2X faster if implemented with either of the first two methods, and fully-vectorized low-order stencil code will be about 1.2X faster if implemented with the combination of the last two methods. Our final best-performing code achieves 20%∼30% of peak GFLOPs/sec, depending on stencil order and compiler.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.seismic modelingacoustic wave propagationhigh performance computingSIMDcache optimizationsOpenMP parallelizationstencil optimizationWave Equation Based Stencil Optimizations on a Multi-core CPUThesis2016-02-05