Analysis of Hadoop’s Performance under Failures
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Failures are common in today’s data center environment and can significantly impact the performance of important jobs running on top of large scale computing frameworks. In this paper we analyze Hadoop’s behavior under compute node and process failures. Surprisingly, we find that even a single failure can have a large detrimental effect on job running times. We uncover several important design decisions underlying this distressing behavior: the inefficiency of Hadoop’s statistical speculative execution algorithm, the lack of sharing failure information and the overloading of TCP failure semantics. We hope that our study will add new dimensions to the pursuit of robust large scale computing framework designs.
Description
Advisor
Degree
Type
Keywords
Citation
Dinu, Florin and Ng, T. S. Eugene. "Analysis of Hadoop�s Performance under Failures." (2011) https://hdl.handle.net/1911/96398.