Browsing by Author "Qasem, Apan"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Automatic Tuning of Scientific Applications(2007) Qasem, Apan; Cooper, Keith D.Over the last several decades we have witnessed tremendous change in the landscape of computer architecture. New architectures have emerged at a rapid pace with computing capabilities that have often exceeded our expectations. However, the rapid rate of architectural innovations has also been a source of major concern for the high-performance computing community. Each new architecture or even a new model of a given architecture has brought with it new features that have added to the complexity of the target platform. As a result, it has become increasingly difficult to exploit the full potential of modern architectures for complex scientific applications. The gap between the theoretical peak and the actual achievable performance has increased with every step of architectural innovation. As multi-core platforms become more pervasive, this performance gap is likely to increase. To deal with the changing nature of computer architecture and its ever increasing complexity, application developers laboriously retarget code, by hand, which often costs many person-months even for a single application. To address this problem, we developed a software-based strategy that can automatically tune applications to different architectures to deliver portable high-performance. This dissertation describes our automatic tuning strategy. Our strategy combines architecture-aware cost models with heuristic search to find the most suitable optimization parameters for the target platform. The key contribution of this work is a novel strategy for pruning the search space of transformation parameters. By focusing on architecture-dependent model parameters instead of transformation parameters themselves, we show that we can dramatically reduce the size of the search space and yet still achieve most of the benefits of the best tuning possible with exhaustive search. We present an evaluation of our strategy on a set of scientific applications and kernels on several different platforms. The experimental results presented in this dissertation suggest that our approach can produce significant performance improvement on a range of architectures at a cost that is not overly demanding.Item Evaluating a Model for Cache Conflict Miss Prediction(2005-04-10) Kennedy, Ken; Qasem, ApanCache conflict misses can cause severe degradation in application performance. Previous research has shown that for many scientific applications majority of cache misses are due to conflicts in cache. Although, conflicts in cache are a major concern for application performance it is often difficult to eliminate them completely. Eliminating conflict misses requires detailed knowledge of the cache replacement policy and the allocation of data in memory. This information is usually not available to the compiler. As such, the compiler has to resort to applying heuristics to try and minimize the occurrence of conflict misses. In this paper, we present a probabilistic method of estimating cache conflict misses for set-associative caches. We present a set of experiments evaluating the model and discuss the implications of the experimental results.Item Improving Performance with Integrated Program Transformations(2004-09-09) Jin, Guohua; Mellor-Crummey, John; Qasem, ApanAchieving a high fraction of peak performance on today’s computer systems is difficult for complex scientific applications. To do so, an application’s characteristics must be tailored to exploit the characteristics of its target architecture. Today, commercial compilers do not adequately tailor programs automatically; thus, application scientists must settle for lackluster performance or manually transform their codes into a form that is complex and unmaintainable. In this paper, we describe a prototype source-to-source transformation tool that enables application scientists to achieve high performance for scientific codes without changing their natural coding style. Our tool supports a rich, integrated collection of optimizing transformations and provides users with precise control over how these optimizations should be applied. In preliminary experiments with the Runga-Kutta advection core from the NCOMMAS code for mesoscale weather modeling and Livermore Loop 18, we have used our tool to double single-processor performance.