Repository logo
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of R-3
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Sbirlea, Dragos Dumitru"

Now showing 1 - 2 of 2
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Integrating stream parallelism and task parallelism in a dataflow programming model
    (2012) Sbirlea, Dragos Dumitru; Sarkar, Vivek
    As multicore computing becomes the norm, exploiting parallelism in applications becomes a requirement for all software. Many applications exhibit different kinds of parallelism, but most parallel programming languages are biased towards a specific paradigm, of which two common ones are task and streaming parallelism. This results in a dilemma for programmers who would prefer to use the same language to exploit different paradigms for different applications. Our thesis is an integration of stream-parallel and task-parallel paradigms can be achieved in a single language with high programmability and high resource efficiency, when a general dataflow programming model is used as the foundation. The dataflow model used in this thesis is Intel's Concurrent Collections (CnC). While CnC is general enough to express both task-parallel and stream-parallel paradigms, all current implementations of CnC use task-based runtime systems that do not deliver the resource efficiency expected from stream-parallel programs. For streaming programs, this use of a task-based runtime system is wasteful of computing cycles and makes memory management more difficult than it needs to be. We propose Streaming Concurrent Collections (SCnC), a streaming system that can execute a subset of applications supported by Concurrent Collections, a general macro data-flow coordination language. Integration of streaming and task models allows application developers to benefit from the efficiency of stream parallelism as well as the generality of task parallelism, all in the context of an easy-to-use and general dataflow programming model. To achieve this integration, we formally define streaming access patterns that, if respected, allow CnC task based applications to be executed using the streaming model. We specify conditions under which an application can run safely, meaning with identical result and without deadlocks using the streaming runtime. A static analysis that verifies if an application respects these patterns is proposed and we describe algorithmic transformations to bring a larger set of CnC applications to a form that can be run using the streaming runtime. To take advantage of dynamic parallelism opportunities inside streaming applications, we propose a simple tuning annotation for streaming applications, that have traditionally been considered with fixed parallelism. Our dynamic parallelism construct, the dynamic splitter, which allows fission of stateful filters with little guidance from the programmer is based on the idea of different places where computations are distributed. Finally, performance results show that transitioning from the task parallel runtime to streaming runtime leads to a throughput increase of up to 40×. In summary, this thesis shows that stream-parallel and task-parallel paradigms can be integrated in a single language when a dataflow model is used as the foundation, and that this integration can be achieved with high programmability and high resource efficiency. Integration of these models allows application developers to benefit from the efficiency of stream parallelism as well as the generality of task parallelism, all in the context of an easy-to-use dataflow programming model.
  • Loading...
    Thumbnail Image
    Item
    Memory and Communication Optimizations for Macro-dataflow Programs
    (2015-06-23) Sbirlea, Dragos Dumitru; Sarkar, Vivek; Cooper, Keith D; Varman, Peter J
    It is now widely recognized that increased levels of parallelism are a necessary condition for improved application performance on multicore computers. However, the memory-per-core ratio is already low and, as the number of cores increases, it is expected to further decrease, making per-core memory efficiency of parallel programs an even more important concern in future systems. Further, the memory requirements of parallel applications can be significantly larger than for their sequential counterparts and their memory utilization also depends critically on the schedule used when running them. This thesis proposes techniques that enable awareness and control of the tradeoff between a program’s memory usage and resulting performance. It does so by taking advantage of the computation structure that is made explicit in macro-dataflow programs which is one of the benefits of macro-dataflow as a programming model for modern multicore applications. To address this challenge, we first introduce folding - a memory management technique that enables programmers to map multiple data values to the same memory slot. This reduces the memory requirement of the program while still preserving its macro-dataflow execution semantics. We then propose an approach that allows dynamic macro-dataflow programs running on shared-memory multicore systems to obey a user-desired memory bound. Using the inspector/executor model, we tailor the set of allowable schedules to either guarantee that the program can be executed within the given memory bound, or throw an error during the inspector phase without running the computation if no feasible schedule can be found. We show that our technique can gracefully span the spectrum (with decreasing memory bounds) from fully parallel to fully serial execution, with several intermediate points between the two. Comparison with OpenMP shows that it can execute in 53% of the memory required by OpenMP while running at 90% (or more) of OpenMP’s performance. Finally, we turn our attention to distributed systems where often the memory size is not a limiting factor, but communication and load balancing are. For these systems, we show that data and task distributions can be selected automatically even for applications expressed as dynamic task graphs, freeing the programmer from the cumbersome selection process .We show that optimal selection can be achieved for certain classes of distributions and cost functions that capture the trade-off between communication and load balance.
  • About R-3
  • Report a Digital Accessibility Issue
  • Request Accessible Formats
  • Fondren Library
  • Contact Us
  • FAQ
  • Privacy Notice
  • R-3 Policies

Physical Address:

6100 Main Street, Houston, Texas 77005

Mailing Address:

MS-44, P.O.BOX 1892, Houston, Texas 77251-1892