Efficient Selection of Vector Instructions using Dynamic Programming

Barik, Rajkishore; Sarkar, Vivek; Zhao, Jisheng

Efficient Selection of Vector Instructions using Dynamic Programming

dc.contributor.author	Barik, Rajkishore	en_US
dc.contributor.author	Sarkar, Vivek	en_US
dc.contributor.author	Zhao, Jisheng	en_US
dc.date.accessioned	2017-08-02T22:03:08Z	en_US
dc.date.available	2017-08-02T22:03:08Z	en_US
dc.date.issued	2010-06-17	en_US
dc.date.note	June 17, 2010	en_US
dc.description.abstract	Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate efficient vector code automatically. However, most commercial and open-source compilers fall short of using the full potential of vector units, and only generate vector code for simple innermost loops. In this paper, we present the design and implementation of an auto-vectorization framework in the backend of a dynamic compiler that not only generates optimized vector code but is also well integrated with the instruction scheduler and register allocator. The framework includes a novel compile-time efficient dynamic programming-based vector instruction selection algorithm for straight-line code that expands opportunities for vectorization in the following ways: (1) scalar packing explores opportunities of packing multiple scalar variables into short vectors; (2) judicious use of shuffle and horizontal vector operations, when possible; and (3) algebraic reassociation expands opportunities for vectorization by algebraic simplification. We report performance results on the impact of auto-vectorization on a set of standard numerical benchmarks using the Jikes RVM dynamic compilation environment. Our results show performance improvement of up to 57.71% on an Intel Xeon processor, compared to non-vectorized execution, with a modest increase in compile time in the range from 0.87% to 9.992%. An investigation of the SIMD parallelization performed by v11.1 of the Intel Fortran Compiler (IFC) on three benchmarks shows that our system achieves speedup with vectorization in all three cases and IFC does not. Finally, a comparison of our approach with an implementation of the Superword Level Parallelization (SLP) algorithm from [21], shows that our approach yields a performance improvement of up to 13.78% relative to SLP.	en_US
dc.format.extent	24 pp	en_US
dc.identifier.citation	Barik, Rajkishore, Sarkar, Vivek and Zhao, Jisheng. "Efficient Selection of Vector Instructions using Dynamic Programming." (2010) https://hdl.handle.net/1911/96387.	en_US
dc.identifier.digital	TR10-07	en_US
dc.identifier.uri	https://hdl.handle.net/1911/96387	en_US
dc.language.iso	eng	en_US
dc.rights	You are granted permission for the noncommercial reproduction, distribution, display, and performance of this technical report in any format, but this permission is only for a period of forty-five (45) days from the most recent time that you verified that this technical report is still available from the Computer Science Department of Rice University under terms that include this permission. All other rights are reserved by the author(s).	en_US
dc.title	Efficient Selection of Vector Instructions using Dynamic Programming	en_US
dc.type	Technical report	en_US
dc.type.dcmi	Text	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: TR10-07.pdf
Size:: 759.25 KB
Format:: Adobe Portable Document Format

Download

Collections

Computer Science Technical Reports
Center for Research Computing