Browsing by Author "Harvey, Timothy J."
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item A simple, fast dominance algorithm(2006-01-11) Cooper, Keith D.; Harvey, Timothy J.; Kennedy, KenThe problem of finding the dominators in a control-flow graph has a long history in the literature. The original algorithms suffered from a large asymptotic complexity but were easy to understand. Subsequent work improved the time bound, but generally sacrificed both simplicity and ease of implementation. This paper returns to a simple formulation of dominance as a global data-flow problem. Some insights into the nature of dominance lead to an implementation of an O(N2) algorithm that runs faster, in practice, than the classic Lengauer-Tarjan algorithm, which has a timebound of O(E ∗ log(N)). We compare the algorithm to Lengauer-Tarjan because it is the best known and most widely used of the fast algorithms for dominance. Working from the same implementation insights, we also rederive (from earlier work on control dependence by Ferrante, et al.) a method for calculating dominance frontiers that we show is faster than the original algorithm by Cytron, et al. The aim of this paper is not to present a new algorithm, but, rather, to make an argument based on empirical evidence that algorithms with discouraging asymptotic complexities can be faster in practice than those more commonly employed. We show that, in some cases, careful engineering of simple algorithms can overcome theoretical advantages, even when problems grow beyond realistic sizes. Further, we argue that the algorithms presented herein are intuitive and easily implemented, making them excellent teaching tools.Item ACME: Adaptive Compilation Made Efficient/Easy(2005-06-17) Cooper, Keith D.; Grosul, Alexander; Harvey, Timothy J.; Reeves, Steven W.; Subramanian, Devika; Torczon, LindaResearch over the past five years has shown significant performance improvements are possible using adaptive compilation. An adaptive compiler uses a compile-execute-analyze feedback loop to guide a series of compilations towards some performance goal, such as minimizing execution time. Despite its ability to improve performance, adaptive compilation has not seen widespread use because of two obstacles: the complexity inherent in a feedback-driven adaptive system makes it difficult to build and hard to use, and the large amounts of time that the system needs to perform the many compilations and executions prohibits most users from adopting these techniques. We have developed a technique called {\em virtual execution} to decrease the time requirements for adaptive compilation. Virtual execution runs the program a single time and preserves information that allows us to accurately predict performance with different optimization sequences. This technology significantly reduces the time required by our adaptive compiler. In conjunction with this performance boost, we have developed a graphical-user interface (GUI) that provides a controlled view of the compilation process. It limits the amount of information that the user must provide to get started, by providing appropriate defaults. At the same time, it lets the user exert fine-grained control over the parameters that control the system. In particular, the user has direct and obvious control over the maximum amount of time the compiler can spend, as well as the ability to choose the number of routines to be examined. (The tool uses profiling to identify the most-executed procedures.) The GUI provides an output screen so that the user can monitor the progress of the compilation.Item Building a Control-flow Graph from Scheduled Assembly Code(2002-02-01) Cooper, Keith D.; Harvey, Timothy J.; Waterman, ToddA variety of applications have arisen where it is worthwhile to apply code optimizations directly to the machine code (or assembly code) produced by a compiler. These include link-time whole-program analysis and optimization, code compression, binary- to-binary translation, and bit-transition reduction (for power). Many, if not most, optimizations assume the presence of a control-flow graph (cfg). Compiled, scheduled code has properties that can make cfg construction more complex than it is inside a typical compiler. In this paper, we examine the problems of scheduled code on architectures that have multiple delay slots. In particular, if branch delay slots contain other branches, the classic algorithms for building a cfg produce incorrect results. We explain the problem using two simple examples. We then present an algorithm for building correct cfgs from scheduled assembly code that includes branches in branch-delay slots. The algorithm works by building an approximate cfg and then refining it to reflect the actions of delayed branches. If all branches have explicit targets, the complexity of the refining step is linear with respect to the number of branches in the code. Analysis of the kind presented in this paper is a necessary first step for any system that analyzes or translates compiled, assembly-level code. We have implemented this algorithm in our power-consumption experiments based on the TMS320C6200 architecture from Texas Instruments. The development of our algorithm was motivated by the output of TI’s compiler.Item Building Adaptive Compilers(2005-01-29) Almagor, L.; Cooper, Keith D.; Grosul, Alexander; Harvey, Timothy J.; Reeves, Steven W.; Subramanian, Devika; Torczon, Linda; Waterman, ToddTraditional compilers treat all programs equally -that is, they apply the same set of techniques to every program that they compile. Compilers that adapt their behavior to fit specific input programs can produce better results. This paper describes out experience building and using adaptive compilers. It presents experimental evidence to show two problems for which adaptive behavior can lead to better results: choosing compilation orders and choosing block sizes. It present data from experimental characterizations of the search spaces in which these adaptive systems operate and describes search algorithms that successfully operate in these spaces. Building these systems has taught us a number of lessons about the construction of modular and reconfigurable compilers. The paper describes some of the problems that we encountered and the solutions that we adopted. It also outlines a number of fertile areas for future research in adaptive compilation.Item Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms(2004-06-18) Almagor, L.; Cooper, Keith D.; Grosul, Alexander; Harvey, Timothy J.; Reeves, Steven W.; Subramanian, Devika; Torczon, Linda; Waterman, ToddMost modern compilers operate by applying a fixed sequence of code optimizations, called a compilation sequence, to all programs. Compiler writers determine a small set of good, general-purpose, compilation sequences by extensive hand-tuning over particular benchmarks. The compilation sequence makes a significant difference in the quality of the generated code; in particular, we know that a single universal compilation sequence does not produce the best results over all programs. Three questions arise in customizing compilation sequences: (1) What is the incremental benefit of using a customized sequence instead of a universal sequence? (2) What is the average computational cost of constructing a customized sequence? (3) When does the benefit exceed the cost? We present one of the first empirically derived cost-benefit tradeoff curves for custom compilation sequences. These curves are for two randomized sampling algorithms: descent with randomized restarts and genetic algorithms. They demonstrate the dominance of these two methods over simple random sampling in sequence spaces where the probability of finding a good sequence is very low. Further, these curves allow compilers to decide whether custom sequence generation is worthwhile, by explicitly relating the computational effort required to obtain a program-specific sequence to the incremental improvement in quality of code generated by that sequence.Item Iterative Data-flow Analysis, Revisited(2004-03-26) Cooper, Keith D.; Harvey, Timothy J.; Kennedy, KenThe iterative algorithm is widely used to solve instances of data-flow analysis problems. The algorithm is attractive because it is easy to implement and robust in its behavior. The theory behind the algorithm shows that, for a broad class of problems, it terminates and produces correct results. The theory also establishes a set of conditions where the algorithm runs in at most d(G) + 3 passes over the graph—a round-robin algorithm, running a "rapid'' framework, on a reducible graph. Fortunately, these restrictions encompass many practical analyses used in code optimization. In practice, compilers encounter situations that lie outside this carefully described region. Compilers encounter irreducible graphs—probably more often than the early studies suggest. They use variations of the algorithm other than the round-robin form. They run on problems that are not rapid. This paper explores both the theory and practice of iterative data-flow analysis. It explains the role of reducibility in the classic Kam-Ullman time bound. It presents experimental data to show that different versions of the iterative algorithm have distinctly different behavior. It gives practical advice that can improve the performance of iterative solvers on both reducible and irreducible graphs.Item Multiplication by Integer Constants(2003-10-21) Briggs, Preston; Harvey, Timothy J.Some modern machines have no integer multiple instruction and must rely on expensive software methods to compute integer products. In other cases, the multiply instruction is significantly slower than simple integer addition. When faced with computing n*c, where n is some unknown integer value and c is a known integer constant, we can avoid the need for a general-purpose multiply by rewriting the expression in terms of shifts, adds, and subtracts—typically all one-cycle instructions. Bernstein gives a detailed discussion of the problem and presents a solution, including Ada code for its implementation. Unfortunately, the code is flawed, at least in part due to typesetting errors. It's also quite difficult to understand. This document represents an attempt to explain the elements of Bernstein's approach. At the same time, we will develop a complete, working, and hopefully understandable implementation of his approach.Item Reducing the Impact of Spill Code(1998-07-24) Harvey, Timothy J.All graph-coloring register allocators rely on heuristics to arrive at a "good" answer to the NP-complete problem of register allocation, resulting in suboptimal code due to spill code. We look at a post-pass to the allocator that removes unnecessary spill code by finding places where the availability of an unused register allows us to "promote a spill to a register. We explain and correct an error in Briggs' spill-code insertion algorithm that sometimes inserts an unnecessary number of spill instructions. This fix has an insignificant impact on the runtime of the compiler and never causes a degradation in runtime of the code produced. We suggest minimizing the impact of the spill code with a small separate memory dedicated to spills and under the exclusive control of the compiler. We show an algorithm and experimental results which suggest that this hardware construct would significantly decrease the runtime of the code.Item Reducing the impact of spill code(1998) Harvey, Timothy J.; Cooper, Keith D.All graph-coloring register allocators rely on heuristics to arrive at a "good" answer to the NP-complete problem of allocation, resulting in suboptimal code. We look at a post-pass to the allocator which removes unnecessary spill code by finding places where the availability of an unused register allows us to "promote" a spill to a register. We explain and correct an error in Briggs' code that sometimes inserts an excessive and unnecessary number of spill instructions. The fix to this bug has an insignificant impact on the runtime of the compiler and never causes a degradation in runtime of the code produced. We suggest minimizing the impact of the spill code through the use of a small separate memory dedicated to spills and under the exclusive control of the compiler. We show an algorithm and experimental results which suggest that this hardware construct would significantly decrease the runtime of the code.