Optimized Runtime Systems for MapReduce Applications in Multi-core Clusters

Sarkar, Vivek2016-01-072016-01-072014-122014-05-27December 2Zhang, Yunming. "Optimized Runtime Systems for MapReduce Applications in Multi-core Clusters." (2014) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/87782">https://hdl.handle.net/1911/87782</a>.https://hdl.handle.net/1911/87782This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilization of multi-core machines' memory in the existing Hadoop MapReduce runtime system. Insufficient memory for each map task leads to the inability to tackle large-scale problems such as genome sequencing and data clustering. The Habanero Hadoop system integrates a shared memory model into the fully distributed memory model of the Hadoop MapReduce system. The improvements eliminate duplication of in-memory data structures used in the map phase, making more memory available to each map task. Previous works optimizing multi-core performance for MapReduce runtime focused on maximizing CPU utilization rather than memory efficiency. My work provided multiple approaches to significantly improve the memory efficiency of the Hadoop MapReduce runtime. The optimized Habanero Hadoop runtime can increase the throughput and maximum input size for certain widely used data analytics applications such as Kmeans and Hash Join by 2x.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.MapReduceMemoryOptimized Runtime Systems for MapReduce Applications in Multi-core ClustersThesis2016-01-07