Browsing by Author "Vrvilo, Nick"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Enhanced Data and Task Abstractions for Extreme-scale Runtime Systems(2017-08-10) Vrvilo, Nick; Sarkar, VivekRecently, we’ve seen a variety of emerging programming models targeting the next generation of HPC hardware, known as extreme-scale computing systems. Extreme-scale runtime systems need to address not only the problems presented by supporting new hardware, but also issues of scalability—whether in small-scale embedded environments or large-scale supercomputing clusters. While a runtime may present all of the necessary functionality to write software for an extreme-scale system, the runtime APIs are rarely a productive interface for application programmers. In this thesis, we present a set of abstractions, which are designed to be implemented on top of an extreme-scale runtime, that will increase programmability and productivity for software developers. These ab-stractions include support for blocking calls in a fine-grained task-based runtime, a data structure representation for relocatable data chunks, and a hierarchical model for design and analysis of macro-dataflow applications. We discuss and demonstrate the tradeoffs among implementation choices for these abstractions, since the specific hardware and soft- ware details of an application deployment may dictate the ideal method of implementing a given abstraction.Item Implementing Asynchronous Checkpoint/Restart for the Concurrent Collections Model(2014-08-12) Vrvilo, Nick; Sarkar, Vivek; Mellor-Crummey, John; Chaudhuri, SwaratIt has been claimed that what simplifies parallelism can also simplify resilience. Based on that assertion, we present the Concurrent Collections programming model (CnC) as an ideal target for a simple yet powerful resilience system for parallel computations. Specifically, we claim that the same attributes that simplify reasoning about parallel applications written in CnC will similarly simplify the implementation of a checkpoint/restart system within the CnC runtime. We define these properties of CnC in the context of a model built in K. To demonstrate how these simplifying properties of CnC help to simplify resilience, we have implemented a simple checkpoint/restart system within Rice’s Habanero C implementation of the CnC runtime. We show how the CnC runtime can fully encapsulate the checkpointing and restarting processes, allowing application programmers to gain all the benefits of resilience without any added effort beyond implementing the application in CnC, while avoiding the synchronization overheads present in traditional techniques.