R-3 Repository :: Browsing by Author "Mukherjee, Rajat"

Browsing by Author "Mukherjee, Rajat"

Now showing 1 - 2 of 2

Simulation of shared memory parallel systems
(1990) Mukherjee, Rajat; Bennett, John K.
This thesis describes a method to simulate parallel programs written for shared memory multiprocessors. We have extended execution-driven simulation to facilitate the simulation of shared memory. We have developed a shared memory profiler, which, at compile-time, inserts simulation support code into the assembly code of the program to be able to extract the data address references at run-time. From the data address, we determine the nature of the reference, simulate the access and account for it. Programs to be simulated are written using Presto, an object-oriented parallel programming environment for shared memory multiprocessors based on C++. To validate the accuracy of our simulation methods, we have developed and evaluated an architecture model for the BBN Butterfly shared memory multiprocessor. The results of these tests are presented and discussed. We also describe extensions that would allow the simulation of shared memory systems with caches using execution-driven simulation techniques.
The interaction of architecture and operating system in the designing of a scalable shared memory multiprocessor
(1995) Mukherjee, Rajat; Bennett, John K.
This dissertation describes the implementation and evaluation of operating system design techniques that can be used to achieve scalability and to improve performance in large-scale shared memory multiprocessors with non-uniform memory hierarchies. We describe the implementation of SALSA, an operating system that incorporates these techniques and that executes on a commercially available processor. The contributions of this dissertation include the implementation of a technique that masks memory latency and increases processor utilization via rapid context switching, and a detailed study of the effects of cache organization and caching policy on latency hiding. The dissertation presents the relative performance of several alternatives for context caching on a register window architecture and shows that write-back, set-associative caches provide best latency hiding performance, especially with constructive cache interference. We have demonstrated significant improvements in program performance (120%) with latency hiding when cache miss latency is high, even with low cache miss rates (1-2%). We show that direct-mapped caches are unsuitable when operating system code is highly sensitive to cache misses, as in the case of context switching trap code. We also show that increased processor utilization can significantly increase contention on the underlying network. In architectures with non-uniform memory access behavior, the exploitation of thread and data placement by the operating system is mandatory for improved performance. The organization of the SALSA kernel exploits the underlying memory architecture. We describe a programming model that takes into account the clustering in the system, and provides primitives for hierarchical data placement and hierarchical thread scheduling. We show that proper data placement can double performance in a three-level memory hierarchy such as Willow. SALSA also provides user control over memory allocation for fine-tuning a program's memory requirements, which was shown to improve program performance by up to 20%. Although the techniques described in this dissertation have been evaluated on a hierarchical bus-based architecture similar to Willow, they are applicable to any large-scale multiprocessor characterized by non-uniform memory access behavior and large memory access latency.

Browsing by Author "Mukherjee, Rajat"

Results Per Page

Sort Options