Browsing by Author "Amza, Cristiana"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Bottleneck Characterization of Dynamic Web Site Benchmarks(2002-02) Amza, Cristiana; Cecchet, Emmanuel; Chanda, Anupam; Cox, Alan; Elnikety, Sameh; Gil, Romer; Marguerite, Julie; Rajamani, Karthick; Zwaenepoel, WillyThe absence of benchmarks for Web sites with dynamic content hasbeen a major impediment to research in this area. We describe three benchmarks for evaluating the performance of Web sites with dynamic content. The benchmarks model three common types of dynamic-content Web sites with widely varying application characteristics: an online bookstore, an auction site, and a bulletin board. For each benchmark we describe the design of the database, the interactions provided by the Web server, and the workloads used in analyzing the performance of the system. We have implemented these three benchmarks with commonly used open-source software. In particular, we used the Apache Web server, the PHP scripting language, and the MySQL relational database. Our implementation is therefore representative of the many dynamic content Web sites built using these tools. Our implementations are available freely from our Web site for other researchers to use. We present a performance evaluation of our implementations of these three benchmarks on contemporary commodity hardware. Our performance evaluation focused on finding andex plaining the bottleneck resources in each benchmark. For the online bookstore, the CPU on the database was the bottleneck, while for the auction site and the bulletin board the CPU on the front-end Web server was the bottleneck. In none of the benchmarks was the network between the front-end and the back-end a bottleneck. With amounts of memory common by today's standards, neither the main memory nor the disk proved to be a limiting factor in terms of performance for any of the benchmarks.Item Conflict -aware replication for dynamic content Web sites(2003) Amza, Cristiana; Zwaenepoel, WillyConflict-aware replication is a novel lazy replication technique for scaling the back-end database of a dynamic content web server using a cluster of commodity computers. This technique provides both throughput scaling and 1-copy serializability. It has generally been believed that this combination is hard to achieve through replication because of the growth of the number of conflicts. Conflict-aware replication interposes a (possibly replicated) scheduler between the database and application server tiers. The conflict-aware scheduler directs incoming queries in such a way that the overall execution is serializable and the number of conflicts is reduced. The technique requires that the incoming transactions specify the tables that they access at the beginning of the transaction. Using this information, conflict-aware replication provides both scaling and 1-copy serializability, while it avoids making any changes to the application server or database. We have implemented a prototype of the conflict-aware scheduler in a cluster-based dynamic content site. We have also implemented various other scheduler algorithms in this prototype for comparison purposes, including conflict-aware and oblivious, with 1-copy serializability and with different looser consistency models. We have evaluated this method using the industry standard TPC-W e-commerce benchmark, an auction site benchmark, modeled after eBay.com, and a bulletin board benchmark, modeled after slashdot.org. For these applications, we have found that pre-specifying what tables are accessed involves very little work on behalf of the programmer and could easily be automated. For clusters with small number of database machines (up to 8) we have measured an implementation of the algorithms. We use simulation to extend our measurement results to larger clusters, faster database engines, and lower conflict rates. This dissertation shows that conflict-awareness brings considerable benefits in terms of both overall throughput scaling and latency reduction compared to both eager and conflict-oblivious lazy replication for a large range of cluster configurations and conflict rates. Furthermore, for all our applications, except those with very high conflict rates, the performance of conflict-aware replication equals or approaches that of looser consistency models. The dissertation also shows that the cost of conflict-aware replication is minimal in terms of data availability and fault tolerance.Item Scaling and Availability for Dynamic Content Web Sites(2002-06-02) Amza, Cristiana; Cox, Alan; Zwaenepoel, WillyWe investigate the techniques necessary for building highly-available, low-cost, scalable servers, suitable for supporting dynamic content web sites. We focus on replication techniques for scaling and availability of a dynamic content site using a cluster of commodity computers running Web servers and database engines. Our techniques allow scaling without undue development, maintenance, and installation costs, avoiding modifications to both the Web server and the database engine. Our results on an eight node database cluster show good scaling for the e-commerce TPC-W benchmark provided that suitable load balancing and replication strategies are in place. Key among these strategies is replication with relaxed consistency, in which the server allows controlled internal data inconsistencies to improve performance while hiding these inconsistencies from the user. The actual choice of load balancing strategy is less important. Locality-based load balancing policies based on data caching, found very profitable in static content servers have almost no impact.Item Scaling e-Commerce Sites(2002-02-19) Amza, Cristiana; Cox, Alan; Zwaenepoel, WillyWe investigate how an e-commerce site can be scaled up from a single machine running a Web server and a database to a cluster of Web server machines and database engine machines. In order to reduce development, maintenance, and installation costs, we avoid modifications to both the Web server and the database engine, and we replicate the database on all database machines. All load balancing and scheduling decisions are implemented in a separate dispatcher. We find that such an architecture scales well for the common e-commerce workload of the TPC-W benchmark, provided that suitable load balancing and scheduling strategies are in place. Key among these strategies is asynchronous scheduling, in which writes complete and are returned to the user as soon as a single instance of the write completes at one of the database engines. The actual choice of load balancing strategy is less important. In particular locality-based load balancing policies, found very profitable for static Web workloads, offer little advantage.Item Software distributed shared memory protocols that adapt between single writer and multiple writer(1997) Amza, Cristiana; Zwaenepoel, WillyWe present two software distributed shared memory protocols that dynamically adapt between a single writer (SW) and a multiple writer (MW) protocol based on the application's sharing patterns. The first protocol adapts based on write-write false sharing, the second based on a combination of write-write false sharing and write granularity. The adaptation is automatic. No user or compiler information is needed. We measured the performance of these protocols on a test suite of eight applications, covering a broad spectrum in terms of write-write false sharing and write granularity. The adaptive protocols match or exceed the performance of the best of MW and SW in seven out of the eight applications. Speedup improvements over SW range from a factor of 1.02 to 2.7, and over MW from 1.02 to 1.6. In addition, memory usage is reduced considerably compared to MW, in some cases making the memory overhead all but negligible.