Staff View
A programming model and execution system for adaptive ensemble applications on high performance computing systems

Descriptive

TitleInfo
Title
A programming model and execution system for adaptive ensemble applications on high performance computing systems
Name (type = personal)
NamePart (type = family)
Balasubramanian
NamePart (type = given)
Vivekanandan
NamePart (type = date)
1991-
DisplayForm
Vivekanandan Balasubramanian
Role
RoleTerm (authority = RULIB)
author
Name (type = personal)
NamePart (type = family)
Jha
NamePart (type = given)
Shantenu
DisplayForm
Shantenu Jha
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
chair
Name (type = personal)
NamePart (type = family)
Turilli
NamePart (type = given)
Matteo
DisplayForm
Matteo Turilli
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
co-chair
Name (type = personal)
NamePart (type = family)
Javanmard
NamePart (type = given)
Mehdi
DisplayForm
Mehdi Javanmard
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
internal member
Name (type = personal)
NamePart (type = family)
Zonouz
NamePart (type = given)
Saman
DisplayForm
Saman Zonouz
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
internal member
Name (type = personal)
NamePart (type = family)
Ortiz
NamePart (type = given)
Jorge
DisplayForm
Jorge Ortiz
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
internal member
Name (type = personal)
NamePart (type = family)
York
NamePart (type = given)
Darrin
DisplayForm
Darrin York
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
outside member
Name (type = corporate)
NamePart
Rutgers University
Role
RoleTerm (authority = RULIB)
degree grantor
Name (type = corporate)
NamePart
School of Graduate Studies
Role
RoleTerm (authority = RULIB)
school
TypeOfResource
Text
Genre (authority = marcgt)
theses
OriginInfo
DateCreated (encoding = w3cdtf); (qualifier = exact)
2019
DateOther (encoding = w3cdtf); (qualifier = exact); (type = degree)
2019-10
CopyrightDate (encoding = w3cdtf); (qualifier = exact)
2019
Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Abstract (type = abstract)
Traditionally, advances in high-performance scientific computing have focused on the scale, performance, and optimization of an application with a large, single task, and less on applications comprised of multiple tasks. However, many scientific problems are expressed as applications that require the collective outcome of one or more ensembles of computational tasks in order to provide insight into the problem being studied. Depending on the scientific problem, a task of an ensemble can be any type of a program: from a molecular simulation, to a data analysis or a machine learning model. With different communication and coordination patterns, both within and across ensembles, the number and type of applications that can be formulated as ensembles is vast and spans many scientific domains, including biophysics, climate science, polar science and earth science.
The performance of ensemble applications can be improved further by using partial results to adapt the application at runtime. Partial results of ongoing executions can be analyzed with multiple methods to adapt the application to focus on relevant portions of the problem space or reduce the time to execution of the application. These benefits are confirmed by the increasing role played by adaptivity in ensemble applications developed to support several domain sciences, including biophysics and climate science.
Although HPC systems provide the computational power required for ensemble applications, their design and policies tend to privilege the execution of single, very large tasks. On the biggest and busiest systems in the world, queue waiting time for each task can reach days while lack of elastic coordination and communication infrastructure makes distributing the execution of ensemble applications difficult. Further, access, submission and execution methods vary across HPC systems, alongside their policies and performance. HPC systems are increasingly displaying performance dynamism and fluctuations due to aggressive thermal management and throttling. Together, these factors make using HPC systems for adaptive and non-adaptive ensemble applications challenging.
Existing solutions to express and execute ensemble applications on HPC systems range from complex scripts and domain specific workflow systems to general purpose workflow systems. Scripts and domain specific workflow systems serve as point solutions often limited in functionality, usability and performance to the scope of the specific application and the HPC system. General purpose systems, on the other hand, requires retrofitting the ensemble applications using the tools and interfaces provided by the system which can be challenging, when feasible.
The goal of this research is to advance the state-of-the-art by simplifying the programmability of ensemble applications, abstracting complexity of their scalable, efficient and robust execution on HPC systems, and, most importantly, enabling the domain scientists to focus on the computational campaigns and algorithmic innovations that are of importance to their science domains. In this dissertation, we describe several science drivers that employ ensemble applications to address some of the most challenging scientific problems of our time. We address three main challenges of executing ensemble applications at scale on HPC resources: (i), we address application diversity and programmability by developing a generic programming model that treats an ‘ensemble’ as a first order concern and provides constructs specifically to express ensemble applications; (ii) we develop a software system, called Ensemble Toolkit, to provide scalable and robust execution of ensemble applications while abstracting the user from the architecture and policies of HPC systems, and resource and execution management; and (iii) we propose and evaluate scheduling strategies to manage the effect of workload heterogeneity and resource dynamism on the time to execute ensemble applications on HPC systems. We discuss several achievements and results obtained in various scientific domains as a consequence of the research and development described in this dissertation.
Subject (authority = LCSH)
Topic
High performance computing
Subject (authority = RUETD)
Topic
Electrical and Computer Engineering
RelatedItem (type = host)
TitleInfo
Title
Rutgers University Electronic Theses and Dissertations
Identifier (type = RULIB)
ETD
Identifier
ETD_10206
PhysicalDescription
Form (authority = gmd)
InternetMediaType
application/pdf
InternetMediaType
text/xml
Extent
1 online resource (xvi, 114 pages) : illustrations
Note (type = degree)
Ph.D.
Note (type = bibliography)
Includes bibliographical references
RelatedItem (type = host)
TitleInfo
Title
School of Graduate Studies Electronic Theses and Dissertations
Identifier (type = local)
rucore10001600001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/t3-0yx8-xx61
Genre (authority = ExL-Esploro)
ETD doctoral
Back to the top

Rights

RightsDeclaration (ID = rulibRdec0006)
The author owns the copyright to this work.
RightsHolder (type = personal)
Name
FamilyName
Balasubramanian
GivenName
Vivekanandan
Role
Copyright Holder
RightsEvent
Type
Permission or license
DateTime (encoding = w3cdtf); (qualifier = exact); (point = start)
2019-09-01 14:39:30
AssociatedEntity
Name
Vivekanandan Balasubramanian
Role
Copyright holder
Affiliation
Rutgers University. School of Graduate Studies
AssociatedObject
Type
License
Name
Author Agreement License
Detail
I hereby grant to the Rutgers University Libraries and to my school the non-exclusive right to archive, reproduce and distribute my thesis or dissertation, in whole or in part, and/or my abstract, in whole or in part, in and from an electronic format, subject to the release date subsequently stipulated in this submittal form and approved by my school. I represent and stipulate that the thesis or dissertation and its abstract are my original work, that they do not infringe or violate any rights of others, and that I make these grants as the sole owner of the rights to my thesis or dissertation and its abstract. I represent that I have obtained written permissions, when necessary, from the owner(s) of each third party copyrighted matter to be included in my thesis or dissertation and will supply copies of such upon request by my school. I acknowledge that RU ETD and my school will not distribute my thesis or dissertation or its abstract if, in their reasonable judgment, they believe all such rights have not been secured. I acknowledge that I retain ownership rights to the copyright of my work. I also retain the right to use all or part of this thesis or dissertation in future works, such as articles or books.
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
ETD
OperatingSystem (VERSION = 5.1)
windows xp
CreatingApplication
Version
1.5
ApplicationName
pdfTeX-1.40.16
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2019-09-01T11:38:04
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2019-09-01T11:38:04
Back to the top
Version 8.5.5
Rutgers University Libraries - Copyright ©2024