Description
TitleRedundancy allocation in distributed systems
Date Created2022
Other Date2022-01 (degree)
Extent145 pages : illustrations
DescriptionDistributed systems, e.g., distributed/parallel computing and distributed storage systems, have become necessary for handling ever-increasing data storage and computing requirements. Distributed systems can significantly increase the job execution speed by employing multiple computing servers to work in parallel. However, the random frustrations of the task execution times, because of many factors such as power management, software or hardware failures, maintenance, and resource sharing, often result in some workers executing their tasks much slower than the others. The existence of straggling workers leads to distributed system performance degradation. Redundancy, as simple task replication, and more recently, erasure coding, has emerged as a potentially powerful way to shorten the job execution time. It allows a job to be completed when only a subset of redundant tasks gets executed, thus avoiding stragglers.
This thesis is concerned with optimizing the redundancy allocation to reduce the job execution time in distributed systems. We address three systems: distributed storage systems, covert message passing on graphs, and distributed computing systems. Because of the different performance targets of each system, we analyze the optimal allocation based on various performance metrics.
In the first part of the thesis, we analyze the allocation of (redundant) file chunks throughout a distributed storage system, which affects crucial performance metrics such as the probability of file recovery and the service rate of the system under a data access model. We focus on quasi-uniform storage allocations where coded content is equally spread among a subset of nodes and provide a service rate analysis for two common data access models. We find that when the accessed file is small, the minimal spreading allocation is universally optimal; it results in the maximal service rate regardless of the system parameters. When the accessed file is large, the optimal allocation depends on the system parameters.
In the second part of the thesis, we introduce a gossip-like protocol for covert message passing in a wireless mobile network environment. This protocol can complement some existing methods which camouflage messages as noise. Redundancy affects two crucial performance metrics: the covertness probability and the communication delay. We characterize the dependence of these performance metrics on the redundancy level and other system parameters and conclude that a different level of redundancy is required to optimize each performance metric. We then determine the tradeoff between the covertness probability and the delay for different system parameters as a function of the introduced redundancy level.
In the third part of the thesis, we analyze the parallelism and diversity tradeoff in a distributed computing system. Distributed computing enables the parallel execution of tasks that make up a large computing job. As task replication and erasure coding, redundancy provides diversity that allows a job to be completed when only a subset of redundant tasks get executed. Thus, both redundancy and parallelism reduce the execution time but compete for the resources of the system. In situations of constrained resources (here, a fixed number of parallel servers), increasing redundancy reduces the level of parallelism. We characterize the diversity vs. parallelism tradeoff for three common models of task size-dependent execution times. We conclude that different models operate optimally at different redundancy levels, thus requiring very different code rates.
NotePh.D.
NoteIncludes bibliographical references
Genretheses
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.