Browsing by Author "Milakovic, Srdan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Compiler and Runtime Optimization of Computational Kernels for Irregular Applications(2023-08-17) Milakovic, Srdan; Mellor-Crummey, John; Budimlić, Zoran; Varman, Peter J; Mamouras, KonstantinosMany computationally-intensive workloads do not fit on individual compute nodes due to their size. As a consequence, such workloads are usually executed on multiple heterogenous compute nodes of a cluster or supercomputer. However, due to the complexity of the hardware, developing efficient and scalable code for modern compute nodes is difficult. Another challenge with sophisticated applications is that data structures, communication, and control patterns are often irregular and unknown before the program execution. Lack of regularity makes static analysis especially difficult or very often impossible. To overcome these issues, programmers use high-level and implicitly parallel programming models or domain-specific libraries that consist of composable building blocks. This dissertation explores compiler and runtime optimizations for automatic granularity selection in the context of two programming paradigms: Concurrent Collections (CnC)---a declarative,dynamic single-assignment, data-race free programming model---and GraphBLAS--a domain-specific Application-specific Programming Interface (API)---. Writing fine-grained CnC programs is easy and intuitive for domain experts because the programmers do not have to worry about parallelism. Additionally, fine-grained programs expose maximum parallelism. However, fine-grained programs can significantly increase the runtime overhead of CnC program execution due to a large number of data accesses and dependencies between computation tasks with respect to the amount of computation that is done by a fine-grained task. Runtime overhead can be reduced by coarsening the data accesses and task dependencies. However, coarsening is usually tedious, and it is not easy even for domain experts. For some applications, the coarse-grained code can be generated by a compiler. However, not all fine-grained applications can be converted to coarse-grained applications because not all information is statically known. In this dissertation, we introduce the concept of micro-runtimes. A micro-runtime is a Hierarchical CnC construct that enables fusion of multiple steps into a higher-level step during program execution. Another way for users to develop applications that efficiently exploit modern hardware is through domain-specific APIs that define composable building blocks. One such API specification is GraphBLAS. GraphBLAS allows users to specify graph algorithms using (sparse) linear algebra building blocks. Even though GraphBLAS libraries usually consist of highly hand-optimized building blocks, GraphBLAS libraries provide limited or no support for inter-kernel optimization. In this dissertation, we investigate multiple different approaches for inter-kernel optimization, including runtime optimizations and compile-time optimizations. Our optimizations reduce the number of arithmetic operations, memory accesses, and memory required for temporary objects.Item Point-to-Point and Barrier Synchronization in Distributed SPMD Systems(2019-11-08) Milakovic, Srdan; Mellor-Crummey, John M; Sarkar, Vivek; Budimlić, ZoranDistributed memory programming models are very often the only way to scale up large scientific applications. To ensure correctness and optimal performance in distributed applications, it is necessary to use general, high-level, but efficient synchronization constructs. Implementing distributed applications using one-sided communication libraries is getting more popular, as opposed to the two-sided communication used in the MPI model. However, in most cases, those libraries only have support for high-level collective barrier synchronization and low-level point-to-point synchronization. Phaser synchronization construct is a very attractive synchronization mechanism because it unifies collective and point-to-point synchronization in a simple, easy to use high-level synchronization construct. In this thesis, we propose several novel algorithms for phaser synchronization on distributed-memory systems with one-sided communication. We also present several improvements to the distributed barrier algorithms in the OpenSHMEM reference implementation. We establish a very high confidence level in algorithms' correctness by using the SPIN model checker for our algorithms. We evaluated our phaser algorithm using several benchmark applications on large supercomputers, and we show that using phasers can reduce the synchronization time by up to 47% and improve total execution time by up to 26%. This thesis shows that high-level, efficient, and intuitive synchronization is possible on distributed systems with one-sided communication.