Optimized Runtime Systems for MapReduce Applications in Multi-core Clusters

Zhang, Yunming

Optimized Runtime Systems for MapReduce Applications in Multi-core Clusters

Files

ZHANG-DOCUMENT-2014.pdf (4.9 MB)

Date

2014-05-27

Authors

Zhang, Yunming

Abstract

This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilization of multi-core machines' memory in the existing Hadoop MapReduce runtime system. Insufficient memory for each map task leads to the inability to tackle large-scale problems such as genome sequencing and data clustering. The Habanero Hadoop system integrates a shared memory model into the fully distributed memory model of the Hadoop MapReduce system. The improvements eliminate duplication of in-memory data structures used in the map phase, making more memory available to each map task. Previous works optimizing multi-core performance for MapReduce runtime focused on maximizing CPU utilization rather than memory efficiency. My work provided multiple approaches to significantly improve the memory efficiency of the Hadoop MapReduce runtime. The optimized Habanero Hadoop runtime can increase the throughput and maximum input size for certain widely used data analytics applications such as Kmeans and Hash Join by 2x.

Advisor

Sarkar, Vivek

Degree

Master of Science

Type

Thesis

Keywords

MapReduce, Memory

Citation

Zhang, Yunming. "Optimized Runtime Systems for MapReduce Applications in Multi-core Clusters." (2014) Master’s Thesis, Rice University. https://hdl.handle.net/1911/87782.

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/87782

Collections

Rice University Theses and Dissertations

Full item page