R-3 Repository :: Browsing by Author "Lu, Honghui"

Browsing by Author "Lu, Honghui"

Now showing 1 - 3 of 3

Message passing versus distributed shared memory on networks of workstations
(1995) Lu, Honghui; Zwaenepoel, Willy
We compared the message passing library Parallel Virtual Machine (PVM) with the distributed shared memory system TreadMarks, on networks of workstations. We presented the performance of nine applications, including Water and Barnes-Hut from the SPLASH benchmarks; 3-D FFT, Integer Sort and Embarrassingly Parallel from the NAS benchmarks; ILINK, a widely used genetic analysis program; and SOR, TSP, and QuickSort. TreadMarks performed nearly identical to PVM on computation bound programs, such as the Water simulation of 1728 molecules. For most of the other applications, including ILINK, TreadMarks performed within 75% of PVM with 8 processes. The separation of synchronization and data transfer, and additional messages to request updates for data in the invalidate-based shared-memory protocol were two of the reasons for TreadMarks's lower performance. TreadMarks also suffered from extra data communication due to false sharing. Moreover, PVM benefited from the ability to aggregate scattered data in a single message.
OpenMP on Networks of Workstations
(2001-01-30) Lu, Honghui
The OpenMP Application Programming Interface (API) is an emerging standard for parallel programming on shared memory architectures. Networks of workstations (NOWs) are attractive parallel programming platforms because of their good price/performance ratio, as well as their and their potential to scale. This work is the first to extend the support for OpenMP to networks of workstations. The design is based on integrating the compiler and the run-time system. In the combined system, the run-time library remains the basic vehicle for implementing shared memory, while the compiler performs optimization rather than implementation. The integrated approach can effectively optimize irregular applications, for which an exact compile-time analysis is not possible. One optimization aggregates messages for irregular applications based on the data access information provided by the compiler. Another optimization improves the scalability of the system by computation replication and multicast. The integrated system also simplifies the run-time implementation. In the implementation of OpenMP on networks of shared-memory multiprocessors, the compile-time information greatly reduces the number of changes required to the run-time system in order to exploit the intra-node hardware shared memory. In another implementation that allows OpenMP programs to change the number of computing nodes during the execution, the OpenMP semantics provides convenient points for efficient adaptation.
OpenMP on networks of workstations
(2001) Lu, Honghui; Zwaenepoel, Willy
The OpenMP Application Programming Interface (API) is an emerging standard for parallel programming on shared memory architectures. Networks of workstations (NOWs) are attractive parallel programming platforms because of their good price/performance ratio, as well as their flexibility and their potential to scale. This work is the first to extend the support for OpenMP to networks of workstations. The design is based on integrating the compiler and the run-time system. In the combined system, the run-time library remains the basic vehicle for implementing shared memory, while the compiler performs optimization rather than implementation. The integrated approach can effectively optimize irregular applications, for which an exact compile-time analysis is not possible. One optimization aggregates messages for irregular applications based on the data access information provided by the compiler. Another optimization improves the scalability of the system by computation replication and multicast. The integrated system also simplifies the run-time implementation. In the implementation of OpenMP on networks of shared-memory multiprocessors, the compile-time information greatly reduces the number of changes required to the run-time system in order to exploit the intra-node hardware shared memory. In another implementation that allows OpenMP programs to change the number of computing nodes during the execution, the OpenMP semantics provides convenient points for efficient adaptation.

Browsing by Author "Lu, Honghui"

Results Per Page

Sort Options