Browsing by Author "Sbirlea, Alina"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item High-level execution models for multicore architectures(2015-11-30) Sbirlea, Alina; Sarkar, Vivek; Cooper, Keith D; Simar, RayHardware design is evolving towards manycore processors that will be used in large clusters to achieve exascale computing, and at the rack level to achieve petascale computing, however, harnessing the full power of the architecture is a challenge that software must tackle to fully realize extreme-scale computing. This challenge is prompting the exploration of new approaches to programming and execution systems. In this thesis, we argue that a two-level execution model is a relevant answer to the problem of extreme-scale computing. We propose an execution model that decomposes the specification of an application into two parts: defining at a high level what the application does, coupled with a low level implementation of how that's done. In our model, applications are designed as a set of sequential computational units whose connections are defined in a high-level, easy-to-write dataflow graph. The dataflow graph we propose --- DFGR --- specifies what the computation is rather than the implementation details, and is also designed for ease of programmability. Second, we propose the use of work-stealing runtimes for coarse-grained dataflow parallelism and doacross runtimes for fine-grained dataflow parallelism, both of which can be expressed uniformly in DFGR. We justify this approach by demonstrating the performance of DFGR on two different runtime systems: Habanero C with coarse-grained task synchronization, and the new proposed OpenMP 4.2 specification with doacross synchronization. Finally, we introduce a novel primitive for combining SPMD parallelism and task parallelism: the elastic task. Elastic tasks allow for internal SPMD parallelism within a computational unit. We provide a new scheduling algorithm for elastic tasks, prove strong theoretical guarantees in both work-sharing and work-stealing environments for this scheduling algorithm, and demonstrate that it also offers performance benefits in practice due to locality and runtime adaptability.Item Mapping a Dataflow Programming Model onto Heterogeneous Architectures(2012-09-05) Sbirlea, Alina; Sarkar, Vivek; Cooper, Keith D.; Mellor-Crummey, John; Budimlic, ZoranThis thesis describes and evaluates how extending Intel's Concurrent Collections (CnC) programming model can address the problem of hybrid programming with high performance and low energy consumption, while retaining the ease of use of data-flow programming. The CnC model is a declarative, dynamic light-weight task based parallel programming model and is implicitly deterministic by enforcing the single assignment rule, properties which ensure that problems are modelled in an intuitive way. CnC offers a separation of concerns by allowing algorithms to be expressed as a two stage process: first by decomposing a problem into components and specifying how components interact with each other, and second by providing an implementation for each component. By facilitating the separation between a domain expert, who can provide an accurate problem specification at a high level, and a tuning expert, who can tune the individual components for better performance, we ensure that tuning and future development, such as replacement of a subcomponent with a more efficient algorithm, become straightforward. A recent trend in mainstream desktop systems is the use of graphics processor units (GPUs) to obtain order-of-magnitude performance improvements relative to general-purpose CPUs. In addition, the use of FPGAs has seen a significant increase for applications that can take advantage of such dedicated hardware. We see that computing is evolving from using many core CPUs to ``co-processing" on the CPU, GPU and FPGA, however hybrid programming models that support the interaction between multiple heterogeneous components are not widely accessible to mainstream programmers and domain experts who have a real need for such resources. We propose a C-based implementation of the CnC model for enabling parallelism across heterogeneous processor components in a flexible way, with high resource utilization and high programmability. We use the task-parallel HabaneroC language (HC) as the platform for implementing CnC-HabaneroC (CnC-HC), a language also used to implement the computation steps in CnC-HC, for interaction with GPU or FPGA steps and which offers the desired flexibility and extensibility of interacting with any other C based language. First, we extend the CnC model with tag functions and ranges to enable automatic code generation of high level operations for inter-task communication. This improves programmability and also makes the code more analysable, opening the door for future optimizations. Secondly, we introduce a way to specify steps that are data parallel and thus are fit to execute on the GPU, and the notion of task affinity, a tuning annotation in the specification language. Affinity is used by the runtime during scheduling and can be fine-tuned based on application needs to achieve better (faster, lower power, etc.) results. Thirdly, we introduce and develop a novel, data-driven runtime for the CnC model, using HabaneroC (HC) as a base language. In addition, we also create an implementation of the previous runtime approach and conduct a study to compare the performance. Next, we expand the HabaneroC dynamic work-stealing runtime to allow cross-device stealing based on task affinity. Cross-device dynamic work-stealing is used to achieve load balancing across heterogeneous platforms for improved performance. Finally, we implement and use a series of benchmarks for testing the model in different scenarios and show that our proposed approach can yield significant performance benefits and low power usage when using a hybrid execution.