Browsing by Author "Medina, David"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item OCCA: A Unified Approach to Multi-Threading Languages(2014-10-23) Medina, David; Warburton, Timothy; Riviere, Beatrice; Symes, William WWith the current trend of using co-processors for accelerating computations, we are presented with architectures and corresponding programming languages. The inability to predict lasting languages and architectures has led to the development of distinct languages and standards. This thesis details my work on occa, a unified threading language presented as a portable solution to hardware-accelerated coding that combines aspects of OpenMP, OpenCL, and CUDA. With the similarities between OpenMP, OpenCL and CUDA, I present a macro-based approach on a unified kernel language that currently encompasses OpenMP, OpenCL and CUDA. Along with kernel generation, occa includes an API (application programming interface) which serves as a wrapper on the three multi-threading languages. The back-end on occa dynamically compiles and loads function objects for a flexible run-time environment to use different hardware architectures. Computational results using a spectrum of methods, namely finite difference, spectral element and discontinuous Galerkin methods, utilizing occa are shown to deliver portable high performance on different architectures and platforms. The finite difference method chapter reverse engineers optimized code written in CUDA and used in industry, discusses distinct features available in CUDA and compares occa implementations using different optimization techniques. The spectral element method and discontinuous Galerkin methods are derived from two projects I worked on during my studies: gNek, a distributed high-order spectral element method (SEM) implementation for the incompressible Navier-Stokes equations, and RiDG, equipped with discontinuous Galerkin (DG) to simulate acoustic wave equations under different assumptions in the material anisotropies. The parallel algorithms used to achieve high parallelization for GPU acceleration are discussed in both, gNek and RiDG, together with performance results.Item OKL: A Unified Language for Parallel Architectures(2015-06) Medina, DavidRapid evolution of computer processor architectures has spawned multiple programming languages and standards. This thesis strives to address the challenges caused by fast and cyclical changes in programming models. The novel contribution of this thesis is the introduction of an abstract unified framework which addresses portability and performance for programming manycore devices. To test this concept, I developed a specific implementation of this framework called occa. OCCA provides evidence that it is possible to achieve high performance across multiple platforms. The programming model investigated in this thesis abstracts a hierarchical representation of modern manycore devices. The model at its lowest level adopts native programming languages for these manycore devices, including serial code, OpenMP, OpenCL, NVIDIA's CUDA, and Intel's COI. At its highest level, the ultimate goal is a high level language that is agnostic about the underlying architecture. I developed a multiply layered approach to bridge the gap between expert "close to the metal" lowlevel programming and novice-level programming. Each layer requires varying degrees of programmer intervention to access low-level features in device architectures. I begin by introducing an approach for encapsulating programming language features, delivering a single intermediate representation (occa IR). Built above the occa IR are two kernel languages extending the prominent programming languages C and Fortran, the occa kernel language (okl) and the occa Fortran language (ofl). Additionally, I contribute two automated approaches for facilitating data movement and automating translations from serial code to okl kernels. To validate occa as a unified framework implementation, I compare performance results across a variety of applications and benchmarks. A spectrum of applications have been ported to utilize occa, showing no performance loss compared to their native programming language counterparts. In addition, a majority of the discussed applications show comparable results with a single occa kernel.Item OKL: A Unified Language for Parallel Architectures(2015-11-25) Medina, David; Warburton, Timothy; Riviere, Beatrice; Sorensen, Danny C; Cooper, Keith DRapid evolution of computer processor architectures has spawned multiple programming languages and standards. This thesis strives to address the challenges caused by fast and cyclical changes in programming models. The novel contribution of this thesis is the introduction of an abstract unified framework which addresses portability and performance for programming many-core devices. To test this concept, I developed a specific implementation of this framework called OCCA. OCCA provides evidence that it is possible to achieve high performance across multiple platforms. The programming model investigated in this thesis abstracts a hierarchical representation of modern many-core devices. The model at its lowest level adopts native programming languages for these many-core devices, including serial code, OpenMP, OpenCL, NVIDIA's CUDA, and Intel's COI. At its highest level, the ultimate goal is a high level language that is agnostic about the underlying architecture. I developed a multiply layered approach to bridge the gap between expert "close to the metal" low-level programming and novice-level programming. Each layer requires varying degrees of programmer intervention to access low-level features in device architectures. I begin by introducing an approach for encapsulating programming language features, delivering a single intermediate representation (OCCA IR). Built above the OCCA IR are two kernel languages extending the prominent programming languages C and Fortran, the OCCA kernel language (OKL) and the OCCA Fortran language (OFL). Additionally, I contribute two automated approaches for facilitating data movement and automating translations from serial code to OKL kernels. To validate OCCA as a unified framework implementation, I compare performance results across a variety of applications and benchmarks. A spectrum of applications have been ported to utilize OCCA, showing no performance loss compared to their native programming language counterparts. In addition, a majority of the discussed applications show comparable results with a single OCCA kernel.