Staff View
Efficient and high-performance data orchestration for large scale cloud workloads

Descriptive

TitleInfo
Title
Efficient and high-performance data orchestration for large scale cloud workloads
Name (type = personal)
NamePart (type = family)
Chen
NamePart (type = given)
Shouwei
DisplayForm
Shouwei Chen
Role
RoleTerm (authority = RULIB)
author
Name (type = personal)
NamePart (type = family)
Rodero
NamePart (type = given)
Ivan
DisplayForm
Ivan Rodero
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
chair
Name (type = personal)
NamePart (type = family)
Parashar
NamePart (type = given)
Manish
DisplayForm
Manish Parashar
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
internal member
Name (type = personal)
NamePart (type = family)
Marsic
NamePart (type = given)
Ivan
DisplayForm
Ivan Marsic
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
internal member
Name (type = personal)
NamePart (type = family)
Wang
NamePart (type = given)
Wensheng
DisplayForm
Wensheng Wang
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
outside member
Name (type = corporate)
NamePart
Rutgers University
Role
RoleTerm (authority = RULIB)
degree grantor
Name (type = corporate)
NamePart
School of Graduate Studies
Role
RoleTerm (authority = RULIB)
school
TypeOfResource
Text
Genre (authority = marcgt)
theses
Genre (authority = ExL-Esploro)
ETD doctoral
OriginInfo
DateCreated (qualifier = exact); (encoding = w3cdtf); (keyDate = yes)
2021
DateOther (type = degree); (qualifier = exact); (encoding = w3cdtf)
2021-05
Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Abstract (type = abstract)
The computing frameworks running in the cloud environment at an extreme scale provide efficient and high-performance computing services to various domains. These cloud computing frameworks build scalable, reliable, and highly accessible data pipelines for many academia, science, and industry services. Data analytics generates a large amount of intermediate data at the back of cloud computing frameworks while processing large amounts of data from different data sources. However, enormous data addresses the challenges to these frameworks to deal with data high performance and efficiency. The data orchestration based on memory and high-performance storage devices has become a key concern to optimize these cloud computing frameworks' performance.

The increasing data scale and complexity of the cloud environment pose challenges to run applications fast and efficiently. The existing computing clusters can fetch the data from different cloud infrastructure, including common storage, high-performance storage devices, and high-speed fabric interconnection. However, it is still challenging to provide the corresponding data orchestration for the existing computing frameworks. First, computing frameworks access the underlying persistent data storage layer based on the different storage devices and memory. Furthermore, the revolution of storage devices addresses new challenges for existing computing frameworks to utilize advanced storage devices efficiently. Second, most of the existing computing frameworks use an intermediate data layer for intermediate storage. However, providing an efficient and high-performant storage layer for large-scale computing frameworks, such as intermediate data storage and shuffle data storage, is still challenging. The imbalance and small data storage introduce new challenges, including new hardware devices and appropriate data orchestration designs. Consequently, the revolution of hardware devices requires a new paradigm for data orchestration for cloud computing frameworks.

This thesis addresses the above challenges and proposes novel mechanisms and solutions for building efficient and high-performance data orchestration for big data frameworks, and makes the following contributions:
(1) Studies representative workloads for big data processing frameworks using different storage technologies and design choices and explores the I/O bottleneck of in-memory big data frameworks on high-performance computing clusters with non-volatile memory. (2) Designs and explores architectural foundations to run in-memory big data framework in the hybrid cloud environment with fast fabric interconnection between geo-distributed data centers. (3) Proposes an abstraction for disaggregated memory pool based on persistent memory and Remote Direct Memory Access (RDMA) to optimize the computing resource efficiency and performance of intermediate storage of big data frameworks. (4) Provides a novel in-transit shuffle mechanism for big data frameworks, which is lightweight and compatible with modern in-memory big data frameworks.

The proposed mechanisms and solutions have been implemented, deployed, and evaluated in high-performance clusters and real computing environments, including academic clusters at Rutgers and production systems at scale in the information technology industry.
Subject (authority = RUETD)
Topic
Electrical and Computer Engineering
Subject (authority = LCSH)
Topic
Cloud computing
Subject (authority = LCSH)
Topic
Electronic data processing
Subject (authority = LCSH)
Topic
Big data
RelatedItem (type = host)
TitleInfo
Title
Rutgers University Electronic Theses and Dissertations
Identifier (type = RULIB)
ETD
Identifier
ETD_11688
PhysicalDescription
Form (authority = gmd)
InternetMediaType
application/pdf
InternetMediaType
text/xml
Extent
1 online resource (xv, 130 pages) : illustrations
Note (type = degree)
Ph.D.
Note (type = bibliography)
Includes bibliographical references
RelatedItem (type = host)
TitleInfo
Title
School of Graduate Studies Electronic Theses and Dissertations
Identifier (type = local)
rucore10001600001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/t3-ckq3-tw41
Back to the top

Rights

RightsDeclaration (ID = rulibRdec0006)
The author owns the copyright to this work.
RightsHolder (type = personal)
Name
FamilyName
Chen
GivenName
Shouwei
Role
Copyright Holder
RightsEvent
Type
Permission or license
DateTime (encoding = w3cdtf); (qualifier = exact); (point = start)
2021-03-31 20:31:30
AssociatedEntity
Name
Shouwei Chen
Role
Copyright holder
Affiliation
Rutgers University. School of Graduate Studies
AssociatedObject
Type
License
Name
Author Agreement License
Detail
I hereby grant to the Rutgers University Libraries and to my school the non-exclusive right to archive, reproduce and distribute my thesis or dissertation, in whole or in part, and/or my abstract, in whole or in part, in and from an electronic format, subject to the release date subsequently stipulated in this submittal form and approved by my school. I represent and stipulate that the thesis or dissertation and its abstract are my original work, that they do not infringe or violate any rights of others, and that I make these grants as the sole owner of the rights to my thesis or dissertation and its abstract. I represent that I have obtained written permissions, when necessary, from the owner(s) of each third party copyrighted matter to be included in my thesis or dissertation and will supply copies of such upon request by my school. I acknowledge that RU ETD and my school will not distribute my thesis or dissertation or its abstract if, in their reasonable judgment, they believe all such rights have not been secured. I acknowledge that I retain ownership rights to the copyright of my work. I also retain the right to use all or part of this thesis or dissertation in future works, such as articles or books.
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
ETD
OperatingSystem (VERSION = 5.1)
windows xp
CreatingApplication
Version
1.5
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2021-04-01T02:26:12
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2021-04-01T02:26:12
ApplicationName
pdfTeX-1.40.20
Back to the top
Version 8.5.5
Rutgers University Libraries - Copyright ©2024