Accelerating Hadoop Map-Reduce for small/intermediate data sizes using the Comet coordination framework

Chaudhari, Shivangi

doi:doi:10.7282/T3M045NS

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Accelerating Hadoop Map-Reduce for small/intermediate data sizes using the Comet coordination framework

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(1.03 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Chaudhari, Shivangi. Accelerating Hadoop Map-Reduce for small/intermediate data sizes using the Comet coordination framework. Retrieved from https://doi.org/doi:10.7282/T3M045NS

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleAccelerating Hadoop Map-Reduce for small/intermediate data sizes using the Comet coordination framework

NameChaudhari, Shivangi (author); Parashar, Manish (chair); Gajic, Zoran (internal member); Pompili, Dario (internal member); Rutgers University; Graduate School - New Brunswick

Date Created

Other Date2009-10 (degree)

SubjectElectrical and Computer Engineering, Electronic data processing--Distributed processing, File organization (Computer science)

Extentvii, 59 p. : ill.

DescriptionMapReduce has been emerging as a popular programming paradigm for data intensive computing in clustered environments. MapReduce as a framework for solving embarrassingly parallel problems has been extensively used on large clusters. These frameworks support ease of computation for petabytes of data mostly through the use of a distributed file system example the Google File System – used by the proprietary ‘Google Map-Reduce’.
In the "Map", the master node takes the input, divides it into smaller sub-problems, and distributes those to worker nodes. The worker node processes that smaller problem, and passes the answer back to its master node. In the "Reduce", the master node then takes the answers of the sub-problems and combines them to get the final output after reduces. The advantage of MapReduce is that, it allows for distributed processing of the map and reduction operations, assuming each operation is independent of the other, all can be executed in parallel.
We found that file writes and reads to the distributed file system, have an overhead especially for smaller data sizes of the order of few tens of GB’s. Our solution provides the MapReduce framework built over Comet framework utilizing TCP sockets for communication and coordination and uses in-memory operations for data whenever possible. The objective of this thesis is to
(1) understand the behaviors and limitations of MapReduce in the case of small-moderate datasets
(2) develop coordination and interaction framework to complement MapReduce-Hadoop to address these shortcomings
(3) demonstrate and evaluate using a real world application
In this thesis we use Comet and its services to build a MapReduce infrastructure that address the above requirements - specifically enable pull based scheduling of Map tasks as well as stream based coordination and data exchange. The framework is based on the master-worker concept. Comet is a decentralized (peer-to-peer) computational infrastructure that supports applications having high computational requirement.
Our System’s interfaces are similar to the Hadoop MapReduce framework, to make applications built on Hadoop easily portable to Comet-based framework. The details of the implementation and evaluation of an actual pharmaceutical problem, with its results have been described. We found that out solution can be used to accelerate the computations of medium sized data by delaying or avoiding the use of distributed file reads and writes.

NoteM.S.

NoteIncludes bibliographical references (p. 58-59)

Noteby Shivangi Chaudhari

Genretheses, ETD graduate

Persistent URLhttps://doi.org/doi:10.7282/T3M045NS

Languageeng

CollectionGraduate School - New Brunswick Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide