- Browse by Author

### Browsing by Author "Yi, Qing"

Now showing 1 - 3 of 3

###### Results Per Page

###### Sort Options

Item Transforming Complex Loop Nests For Locality(2002-02-19) Kennedy, Ken; Yi, QingShow more Because of the increasing gap between the speeds of processors and standard memory chips, many compiler techniques have been developed to enhance locality of applications. This paper focuses on automatically optimizing complicated loop structures, for which existing techniques are either ineffective or require too much computation time to be practical for a commercial compiler. Building on traditional unimodular transformations on perfectly nested loops, we have developed a novel transformation called dependence hoisting. This transformation facilitates fusion of a set of arbitrarily nested loops at the outermost position of a code segment containing these loops. This transformation is especially useful when the loops to befused are nested inside one another and when some loops cannot be legallydistributed before fusion. We have also developed a transformation framework called computation slicing which applies dependence hoisting to block arbitrary loop nests for better locality. In terms of both asymptotic complexity, which is comparable to that of standard unimodular loop transformations, and actual running time, as measured in our experimental results, computation slicing should be efficient enough for inclusion in commercial production compilers. We have implemented the framework as a Fortransource-to-source translator. Our implementation has successfully blocked four numerical benchmark kernels: Cholesky, QR, LU factorization without pivoting,and LU factorization with partial pivoting. The automatically-blocked benchmarks achieved performance improvements similar to those attained by manually blocked programs in LAPACK. The automatic blocking of QR and LU with partial pivoting is a notable achievement because these benchmarks include loop nests that are considered difficult—to our knowledge, no previous compiler implementation has completely automated the blocking of QR and LU with pivoting. This fact indicates that our technique can in practice match or exceed the effectiveness of many general loop transformation frameworks.Show more Item Transforming Complex Loop Nests for Locality(2002-04-01) Yi, QingShow more Over the past 20 years, increases in processor speed have dramatically outstripped performance increases for standard memory chips. To bridge this gap, compilers must optimize applications so that data fetched into caches are reused before being displaced. Existing compiler techniques can efficiently optimize simple loop structures such as sequences of perfectly nested loops. However, on more complicated structures, existing techniques are either ineffective or require too much computation time to be practical for a commercial compiler. This thesis develops the following novel techniques to optimize complex loop structures both effectively and inexpensively for better locality. Extended dependence representation: a matrix representation that incorporates dependence relations between iterations of arbitrarily nested loops. Transitive dependence analysis algorithm: a new algorithm that improves the time complexity of existing transitive dependence analysis algorithms. Dependence hoisting: a new loop transformation technique that permits the direct fusion and interchange of arbitrarily nested loops. The transformation is inexpensive and can be incorporated into most commercial compilers. Computation slicing: a framework that systematically applies dependence hoisting to optimize arbitrary loop structures for better locality. Recursion transformation: the first compiler work that automatically trans- forms loop structures into recursive form to exploit locality simultaneously at multiple levels of the memory hierarchy. Both the computation slicing framework and recursion transformation have been implemented and applied to successfully optimize a collection of benchmarks. In particular, the slicing framework has successfully blocked four linear algebra kernels: Cholesky, QR, LU factorization without pivoting, and LU with partial pivoting. The auto-blocked versions have achieved performance improvements similar to those attained by manually blocked programs in LAPACK [7]. The automatic blocking of QR and pivoting LU is a notable achievement because these kernels include loop nests that are considered difficult — to our knowledge, few previous compiler implementations have completely automated the blocking of the loop nests in these kernels. These facts indicate that although with a cost much lower than that of existing more general transformation frameworks [34, 42, 2, 36, 49], the computation slicing framework can in practice match or exceed the effectiveness of these general frameworks.Show more Item Transforming complex loop nests for locality(2002) Yi, Qing; Kennedy, KenShow more Over the past 20 years, increases in processor speed have dramatically outstripped performance increases for standard memory chips. To bridge this gap, compilers must optimize applications so that data fetched into caches are reused before being displaced. Existing compiler techniques can efficiently optimize simple loop structures such as sequences of perfectly nested loops. However, on more complicated structures, existing techniques are either ineffective or require too much computation time to be practical for a commercial compiler. This thesis develops the following novel techniques to optimize complex loop structures both effectively and inexpensively for better locality. Extended dependence representation: a matrix representation that incorporates dependence relations between iterations of arbitrarily nested loops. Transitive dependence analysis algorithm: a new algorithm that improves the time complexity of existing transitive dependence analysis algorithms. Dependence hoisting: a new loop transformation technique that permits the direct fusion and interchange of arbitrarily nested loops. The transformation is inexpensive and can be incorporated into most commercial compilers. Computation slicing: a framework that systematically applies dependence hoisting to optimize arbitrary loop structures for better locality. Recursion transformation: the first compiler work that automatically transforms loop structures into recursive form to exploit locality simultaneously at multiple levels of the memory hierarchy. Both the computation slicing framework and recursion transformation have been implemented and applied to successfully optimize a collection of benchmarks. In particular, the slicing framework has successfully blocked four linear algebra kernels: Cholesky, QR, LU factorization without pivoting, and LU with partial pivoting. The auto-blocked versions have achieved performance improvements similar to those attained by manually blocked programs in LAPACK [7]. The automatic blocking of QR and pivoting LU is a notable achievement because these kernels include loop nests that are considered difficult---to our knowledge, few previous compiler implementations have completely automated the blocking of the loop nests in these kernels. These facts indicate that although with a cost much lower than that of existing more general transformation frameworks [34, 42, 2, 36, 49], the computation slicing framework can in practice match or exceed the effectiveness of these general frameworks.Show more