Improving the Efficiency of Map-Reduce Task Engine

dc.contributor.advisorCox, Alan L.en_US
dc.contributor.committeeMemberRixner, Scotten_US
dc.contributor.committeeMemberSarkar, Viveken_US
dc.creatorChadha, Mehulen_US
dc.date.accessioned2016-01-06T21:00:20Zen_US
dc.date.available2016-01-06T21:00:20Zen_US
dc.date.created2014-12en_US
dc.date.issued2014-10-03en_US
dc.date.submittedDecember 2014en_US
dc.date.updated2016-01-06T21:00:20Zen_US
dc.description.abstractMap-Reduce is a popular distributed programming framework for parallelizing computation on huge datasets over a large number of compute nodes. This year completes a decade since it was invented by Google in 2004. Hadoop, a popular open source implementation of Map-Reduce was introduced by Yahoo in 2005. Over these years many researchers have worked on various problems related to Map-Reduce and similar distributed programming models. Hadoop itself has been the subject of various research projects. The prior work in this field is focussed on making Map- Reduce more efficient for iterative processing, or making it more pipelined across different jobs. This has resulted in an improvement of performance for iterative applications. However, little focus was given to the task engine which carries out the Map-Reduce computation itself. Our analysis of applications running on Hadoop shows that more than 50% of the time is spent in the framework in doing tasks such as sorting, serialization and deserialization . We solve this problem introducing an extension to the Map-Reduce programming model. This extension allows us to use more efficient data structures like hash tables. It also allows us to lower the cost of serialization and deserialization of the key value pairs. With these efforts we have been able to lower the overheads of the framework, and the performance of certain important applications such as Pagerank and Join has improved by 1.5 to 2.5 times.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationChadha, Mehul. "Improving the Efficiency of Map-Reduce Task Engine." (2014) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/87731">https://hdl.handle.net/1911/87731</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/87731en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectHadoopen_US
dc.subjectMap-Reduceen_US
dc.subjectPageranken_US
dc.subjectJoinen_US
dc.subjectBarrier freeen_US
dc.titleImproving the Efficiency of Map-Reduce Task Engineen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CHADHA-DOCUMENT-2014.pdf
Size:
2.71 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: