R-3 Repository :: Browsing by Author "Ranganathan, Parthasarathy"

Browsing by Author "Ranganathan, Parthasarathy"

Now showing 1 - 11 of 11

An evaluation of memory consistency models for shared-memory systems with ILP processors
(1997) Ranganathan, Parthasarathy; Adve, Sarita V.
The memory consistency model of a shared-memory multiprocessor determines the extent to which memory operations may be overlapped or reordered for better performance. Studies on previous-generation shared-memory multiprocessors have shown that relaxed memory consistency models like release consistency (RC) can significantly outperform the conceptually simpler model of sequential consistency (SC). Current and next-generation multiprocessors use commodity microprocessors that aggressively exploit instruction-level parallelism (ILP) using methods such as multiple issue, dynamic scheduling, and non-blocking reads. For such processors, researchers have conjectured that two techniques, hardware-controlled non-binding prefetching and speculative reads, have the potential to equalize the hardware performance of memory consistency models. These techniques have recently begun to appear in commercial microprocessors, and re-open the question of whether the performance benefits of release consistency justify its added programming complexity. This thesis performs the first detailed quantitative comparison of several implementations of sequential consistency and release consistency optimized for aggressive ILP processors. Our results indicate that although hardware prefetching and speculative reads dramatically improve the performance of sequential consistency, the simplest RC version continues to significantly outperform the most optimized SC version. Additionally, the performance of SC is highly sensitive to the cache write policy and the aggressiveness of the cache-coherence protocol, while the performance of RC is generally stable across all implementations. Overall our results show that RC hardware has significant performance benefits over SC hardware, and at the same time, requires less system complexity with ILP processors. Memory write latencies that hardware prefetching and speculative loads are unsuccessful in hiding are the main reason for the performance difference between SC and RC.
An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors
(1996-10-20) Pai, Vijay S.; Ranganathan, Parthasarathy; Adve, Sarita V.; Harton, Tracy; CITI (http://citi.rice.edu/)
None
General-purpose architectures for media processing and database workloads
(2000) Ranganathan, Parthasarathy; Adve, Sarita V.
Workloads on general-purpose computing systems have changed dramatically over the past few years, with greater emphasis on emerging compute-intensive applications such as media processing and databases. However, until recently, most high performance computing studies have primarily focused on scientific and engineering workloads, potentially leading to designs not suitable for these emerging workloads. This dissertation addresses this limitation. Our key contributions include (i) the first detailed quantitative simulation-based studies of the performance of media processing and database workloads on systems using state-of-the-art processors, and (ii) cost-effective architectural solutions targeted at achieving the higher performance requirements of future systems running these workloads. The first part of the dissertation focuses on media processing workloads. We study the effectiveness of state-of-the-art features (techniques to extract instruction-level parallelism, media instruction-set extensions, software prefetching, and large caches). Our results identify two key trends: (i) media workloads on current general-purpose systems are primarily compute-bound and (ii) current trends towards devoting a large fraction of on-chip transistors (up to 80%) for caches can often be ineffective for media workloads. In response to these trends, we propose and evaluate a new cache organization, called reconfigurable caches. Reconfigurable caches allow the on-chip cache transistors to be dynamically divided into partitions that can be used for other activities (e.g., instruction memoization, application-controlled memory, and prefetching buffers), including optimizations that address the compute bottleneck. Our design of the reconfigurable cache requires relatively few modifications to existing cache structures and has small impact on cache access times. The second part of the dissertation evaluates the performance of database workloads like online transaction processing and decision support system on shared-memory multiprocessor servers with state-of-the-art processors. Our main results show that the key performance-limiting characteristics of online transaction processing workloads are (i) large instruction footprints (leading to instruction cache misses) and (ii) frequent data communication (leading to cache-to-cache misses). We show that both these inefficiencies can be addressed with simple cost-effective optimizations. Additionally, our analysis of optimized memory consistency models with state-of-the-art processors suggest that the choice of the hardware consistency model of the system may not be a dominant factor for database workloads.
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
(1999-02-20) Pai, Vijay S.; Ranganathan, Parthasarathy; Abdel-Shafi, Hazim; Adve, Sarita V.; CITI (http://citi.rice.edu/)
Current microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latency-hiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are less effective in removing memory stall time. Consequently, despite the inherent latency tolerance features of ILP processors, we find memory system performance to be a larger bottleneck and parallel efficiencies to be generally poorer in ILP- based multiprocessors than in previous generation multiprocessors. The main reasons for these deficiencies are insufficient opportunities in the applications to overlap multiple load misses and increased contention for resources in the system. We also find that software prefetching does not change the memory bound nature of most of our applications on our ILP multiprocessor, mainly due to a large number of late prefetches and resource contention. Our results suggest the need for additional latency hiding or reducing techniques for ILP systems, such as software clustering of load misses and producer-initiated communication.
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology
(1997-02-20) Pai, Vijay S.; Ranganathan, Parthasarathy; Adve, Sarita V.; CITI (http://citi.rice.edu/)
None
The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems
(1997-06-20) Ranganathan, Parthasarathy; Pai, Vijay S.; Abdel-Shafi, Hazim; Adve, Sarita V.; CITI (http://citi.rice.edu/)
None
Recent Advances in Memory Consistency Models for Hardware Shared Memory Systems
(1999-03-20) Adve, Sarita V.; Pai, Vijay S.; Ranganathan, Parthasarathy; CITI (http://citi.rice.edu/)
None
RSIM Reference Manual: Version 1.0
(1997-08-20) Pai, Vijay S.; Ranganathan, Parthasarathy; Adve, Sarita V.; CITI (http://citi.rice.edu/)
Simulation has emerged as an important method for evaluating new ideas in both uniprocessor and multiprocessor architecture. Compared to building real hardware, simulation provides at least two advantages. First it provides the flexibility to modify various architectural parameters and components and to analyze the benefits of such modification. Second, simulation allows for detailed statistics collection, providing a better understanding of the tradeoffs involved and facilitating further performance tuning. This document describes RSIM - the Rice Simulator for ILP Multiprocessors (Version 1.0). RSIM is an execution-driven simulator primarily designed to study shared-memory multiprocessor architectures built from state-of-the-art processors. Compared to other current publicly available shared-memory simulators, the key advantage of RSIM is that it supports a processor model that aggressively exploits instruction-level parallelism (ILP) and is more representative of current and near-future processors. Currently available shared-memory simulators assume a much simpler processor model, and can exhibit significant inaccuracies when used to study the behavior of shared-memory multiprocessors built from state-of-the-art ILP processors. A cost of the increased accuracy and detail of RSIM is that it is slower than simulators that do not model the processor. We have used RSIM at Rice for our research in computer architecture, as well as for undegraduate and graduate architecture courses covering both uniprocessor and multiprocessor architectures.
RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors
(1997-10-20) Pai, Vijay S.; Ranganathan, Parthasarathy; Adve, Sarita V.; CITI (http://citi.rice.edu/)
None
Rsim: Simulating Shared-Memory Multiprocessors with ILP Processors
(2002-02-20) Hughes, Christopher J.; Pai, Vijay S.; Ranganathan, Parthasarathy; Adve, Sarita V.; CITI (http://citi.rice.edu/)
Rsim is a publicly available architecture simulator for shared-memory systems built from processors that aggressively exploit instruction-level parallelism. Modeling ILP features in a multiprocessor is particularly important for applications that exhibit parallelism among read misses.
Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models
(1997-06-20) Ranganathan, Parthasarathy; Pai, Vijay S.; Adve, Sarita V.; CITI (http://citi.rice.edu/)
None

Browsing by Author "Ranganathan, Parthasarathy"

Results Per Page

Sort Options