R-3 Repository :: Browsing by Author "Cox, Alan"

Browsing by Author "Cox, Alan"

Now showing 1 - 8 of 8

A Simple and Effective Caching Scheme for Dynamic Content
(2000-11-28) Cox, Alan; Rajamani, Karthick
As web sites increasingly deliver dynamic content, the process of content generation at request time is becoming a severe limitation to web site throughput. Recent studies have shown that much of the dynamic content is, however, better characterized as pseudo-dynamic, i.e., a dynamic composition of stored or static data. Consequently, caching the generated web pages may increase the web server's throughput if there is some temporal locality in the request stream. In this paper, we perform a quantitative analysis of the benefits of caching for dynamic content using the e-commerce benchmark, TPC-W,as the workload. We implement caching through a simple and efficient Apache extension module, DCache, that can be easily incorporated into the current infrastructure for dynamic content delivery. Our DCache module uses conventional expiration times and our own request-initiated invalidation scheme as the methods for keeping the cache consistent. It also supports site-specific optimization by providing a mechanism to incorporate the priorities of specific web pages into the caching scheme. Our experiments show that we can obtain over 3 times the non-caching throughput with our caching approach.
An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System
(1997-11-17) Cox, Alan; Dwarkadas, Sandhya; Zwaenepoel, Willy
High Performance Fortran (HPF), as well as its predecessor FortranD,has attracted considerable attention as a promising language for writing portable parallel programs for a wide variety of distributed-memory architectures. Programmers express data parallelism using Fortran90 array operations and use data layout directives to direct the partitioning of the data and computation among the processors of a parallel machine. For HPF to gain acceptance as a vehicle for parallel scientific programming, it must achieve high performance on problems for which it is well suited. To achieve high performance with an HPF program on a distributed-memory parallel machine, an HPF compiler must do a superb job of translating Fortran90 data-parallel array constructs into an efficient sequence of operations that minimize the overhead associated with data movement and also maximize data locality. This dissertation presents and analyzes a set of advanced optimizations designed to improve the execution performance of HPF programs on distributed-memory architectures. Presented is a methodology for performing deep analysis ofFortran90 programs, eliminating the reliance upon pattern matching to drive the optimizations as is done in many Fortran90 compilers. The optimizations address the overhead of data movement, both interprocessor and intraprocessor movement, that results from the translation of Fortran90 array constructs. Additional optimizations address the issues of scalarizing array assignment statements, loop fusion, and data locality. The combination of these optimizations results in a compiler that is capable of optimizing dense matrix stencil computations more completely than all previous efforts in this area. This work is distinguished by advanced compile-time analysis and optimizations performed at the whole-array level as opposed to analysis and optimization performed at the loop or array-element levels.
Bottleneck Characterization of Dynamic Web Site Benchmarks
(2002-02) Amza, Cristiana; Cecchet, Emmanuel; Chanda, Anupam; Cox, Alan; Elnikety, Sameh; Gil, Romer; Marguerite, Julie; Rajamani, Karthick; Zwaenepoel, Willy
The absence of benchmarks for Web sites with dynamic content hasbeen a major impediment to research in this area. We describe three benchmarks for evaluating the performance of Web sites with dynamic content. The benchmarks model three common types of dynamic-content Web sites with widely varying application characteristics: an online bookstore, an auction site, and a bulletin board. For each benchmark we describe the design of the database, the interactions provided by the Web server, and the workloads used in analyzing the performance of the system. We have implemented these three benchmarks with commonly used open-source software. In particular, we used the Apache Web server, the PHP scripting language, and the MySQL relational database. Our implementation is therefore representative of the many dynamic content Web sites built using these tools. Our implementations are available freely from our Web site for other researchers to use. We present a performance evaluation of our implementations of these three benchmarks on contemporary commodity hardware. Our performance evaluation focused on finding andex plaining the bottleneck resources in each benchmark. For the online bookstore, the CPU on the database was the bottleneck, while for the auction site and the bulletin board the CPU on the front-end Web server was the bottleneck. In none of the benchmarks was the network between the front-end and the back-end a bottleneck. With amounts of memory common by today's standards, neither the main memory nor the disk proved to be a limiting factor in terms of performance for any of the benchmarks.
A Flexible and Efficient Application Programming Interface (API) for a Customizable Proxy Cache
(2003-03-20) Pai, Vivek S.; Cox, Alan; Pai, Vijay S.; Zwaenepoel, Willy
This paper describes the design, implementation, and performance of a simple yet powerful Application Programming Interface (API) for providing extended services in a proxy cache. This API facilitates the development of customized content adaptation, content management, and specialized administration features. We have developed several modules that exploit this API to perform various tasks within the proxy, including a module to support the Internet Content Adaptation Protocol (ICAP) without any changes to the proxy core. The API design parallels those of high-performance servers, enabling its implementation to have minimal overhead on a high-performance cache. At the same time, it provides the infrastructure required to process HTTP requests and responses at a high level, shielding developers from low-level HTTP and socket details and enabling modules that perform interesting tasks without significant amounts of code. We have implemented this API in the portable and high-performance iMimic DataReactorâ ¢ proxy cache. We show that implementing the API imposes negligible performance overhead and that realistic content-adaptation services achieve high performance levels without substantially hindering a background benchmark load running at a high throughput level.
Mitosis: A High Performance, Scalable Virtual Memory System
(2001-05-08) Cox, Alan; Navarro, Juan
Many modern applications use virtual memory APIs introduced in the 1980's in unforeseen ways, stressing the underlying data structures and exposing the old designs to a variety of performance and scalability problems. The two-decade-old data structures show their age when, for instance, a Web server maps thousands of files or a garbage collector plays memory protection tricks. Observing how today's applications use the VM facilities, we came up with a set of requirements that any VM implementation should follow in order to efficiently support modern workloads. Current VM systems completely neglect one of these requirements, and only partially fulfill a second one. In this paper we propose a design that meet all of the requirements, and present preliminary performance results. We also describe the future second stage of this project: the use of persistent data structures, that is, structures that are shared on a copy-on-write way. Current VM systems use copy-on-write techniques on physical memory to reduce the overhead of forking, but the semantics of fork suggest amore aggressive approach: use copy-on-write to share the data structures as well. Persistence presents a number of advantages and solves in a uniform way additional problems that current systems have solved only partially and in an ad-hoc manner. We describe how we plan to extend our implementation to include persistence.
Optimized Runtime Systems for MapReduce Applications in Multi-core Clusters
(2014-05-27) Zhang, Yunming; Sarkar, Vivek; Cox, Alan; Mellor-Crummey, John
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilization of multi-core machines' memory in the existing Hadoop MapReduce runtime system. Insufficient memory for each map task leads to the inability to tackle large-scale problems such as genome sequencing and data clustering. The Habanero Hadoop system integrates a shared memory model into the fully distributed memory model of the Hadoop MapReduce system. The improvements eliminate duplication of in-memory data structures used in the map phase, making more memory available to each map task. Previous works optimizing multi-core performance for MapReduce runtime focused on maximizing CPU utilization rather than memory efficiency. My work provided multiple approaches to significantly improve the memory efficiency of the Hadoop MapReduce runtime. The optimized Habanero Hadoop runtime can increase the throughput and maximum input size for certain widely used data analytics applications such as Kmeans and Hash Join by 2x.
Scaling and Availability for Dynamic Content Web Sites
(2002-06-02) Amza, Cristiana; Cox, Alan; Zwaenepoel, Willy
We investigate the techniques necessary for building highly-available, low-cost, scalable servers, suitable for supporting dynamic content web sites. We focus on replication techniques for scaling and availability of a dynamic content site using a cluster of commodity computers running Web servers and database engines. Our techniques allow scaling without undue development, maintenance, and installation costs, avoiding modifications to both the Web server and the database engine. Our results on an eight node database cluster show good scaling for the e-commerce TPC-W benchmark provided that suitable load balancing and replication strategies are in place. Key among these strategies is replication with relaxed consistency, in which the server allows controlled internal data inconsistencies to improve performance while hiding these inconsistencies from the user. The actual choice of load balancing strategy is less important. Locality-based load balancing policies based on data caching, found very profitable in static content servers have almost no impact.
Scaling e-Commerce Sites
(2002-02-19) Amza, Cristiana; Cox, Alan; Zwaenepoel, Willy
We investigate how an e-commerce site can be scaled up from a single machine running a Web server and a database to a cluster of Web server machines and database engine machines. In order to reduce development, maintenance, and installation costs, we avoid modifications to both the Web server and the database engine, and we replicate the database on all database machines. All load balancing and scheduling decisions are implemented in a separate dispatcher. We find that such an architecture scales well for the common e-commerce workload of the TPC-W benchmark, provided that suitable load balancing and scheduling strategies are in place. Key among these strategies is asynchronous scheduling, in which writes complete and are returned to the user as soon as a single instance of the write completes at one of the database engines. The actual choice of load balancing strategy is less important. In particular locality-based load balancing policies, found very profitable for static Web workloads, offer little advantage.

Browsing by Author "Cox, Alan"

Results Per Page

Sort Options