Improving the Efficiency of Map-Reduce Task Engine

dc.contributor.advisorCox, Alan L.
dc.contributor.committeeMemberRixner, Scott
dc.contributor.committeeMemberSarkar, Vivek
dc.creatorChadha, Mehul
dc.date.accessioned2016-01-06T21:00:20Z
dc.date.available2016-01-06T21:00:20Z
dc.date.created2014-12
dc.date.issued2014-10-03
dc.date.submittedDecember 2014
dc.date.updated2016-01-06T21:00:20Z
dc.description.abstractMap-Reduce is a popular distributed programming framework for parallelizing computation on huge datasets over a large number of compute nodes. This year completes a decade since it was invented by Google in 2004. Hadoop, a popular open source implementation of Map-Reduce was introduced by Yahoo in 2005. Over these years many researchers have worked on various problems related to Map-Reduce and similar distributed programming models. Hadoop itself has been the subject of various research projects. The prior work in this field is focussed on making Map- Reduce more efficient for iterative processing, or making it more pipelined across different jobs. This has resulted in an improvement of performance for iterative applications. However, little focus was given to the task engine which carries out the Map-Reduce computation itself. Our analysis of applications running on Hadoop shows that more than 50% of the time is spent in the framework in doing tasks such as sorting, serialization and deserialization . We solve this problem introducing an extension to the Map-Reduce programming model. This extension allows us to use more efficient data structures like hash tables. It also allows us to lower the cost of serialization and deserialization of the key value pairs. With these efforts we have been able to lower the overheads of the framework, and the performance of certain important applications such as Pagerank and Join has improved by 1.5 to 2.5 times.
dc.format.mimetypeapplication/pdf
dc.identifier.citationChadha, Mehul. "Improving the Efficiency of Map-Reduce Task Engine." (2014) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/87731">https://hdl.handle.net/1911/87731</a>.
dc.identifier.urihttps://hdl.handle.net/1911/87731
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectHadoop
dc.subjectMap-Reduce
dc.subjectPagerank
dc.subjectJoin
dc.subjectBarrier free
dc.titleImproving the Efficiency of Map-Reduce Task Engine
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CHADHA-DOCUMENT-2014.pdf
Size:
2.71 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: