Improving the Efficiency of Map-Reduce Task Engine

Chadha, Mehul

Improving the Efficiency of Map-Reduce Task Engine

Files

CHADHA-DOCUMENT-2014.pdf (2.71 MB)

Date

2014-10-03

Authors

Chadha, Mehul

Abstract

Map-Reduce is a popular distributed programming framework for parallelizing computation on huge datasets over a large number of compute nodes. This year completes a decade since it was invented by Google in 2004. Hadoop, a popular open source implementation of Map-Reduce was introduced by Yahoo in 2005. Over these years many researchers have worked on various problems related to Map-Reduce and similar distributed programming models. Hadoop itself has been the subject of various research projects. The prior work in this field is focussed on making Map- Reduce more efficient for iterative processing, or making it more pipelined across different jobs. This has resulted in an improvement of performance for iterative applications. However, little focus was given to the task engine which carries out the Map-Reduce computation itself. Our analysis of applications running on Hadoop shows that more than 50% of the time is spent in the framework in doing tasks such as sorting, serialization and deserialization . We solve this problem introducing an extension to the Map-Reduce programming model. This extension allows us to use more efficient data structures like hash tables. It also allows us to lower the cost of serialization and deserialization of the key value pairs. With these efforts we have been able to lower the overheads of the framework, and the performance of certain important applications such as Pagerank and Join has improved by 1.5 to 2.5 times.

Advisor

Cox, Alan L.

Degree

Master of Science

Type

Thesis

Keywords

Hadoop, Map-Reduce, Pagerank, Join, Barrier free

Citation

Chadha, Mehul. "Improving the Efficiency of Map-Reduce Task Engine." (2014) Master’s Thesis, Rice University. https://hdl.handle.net/1911/87731.

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/87731

Collections

Rice University Theses and Dissertations

Full item page