Redundancy allocation in distributed systems

Peng, Pei

doi:doi:10.7282/t3-hsar-tj60

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Redundancy allocation in distributed systems

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(6.59 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Peng, Pei. Redundancy allocation in distributed systems. Retrieved from https://doi.org/doi:10.7282/t3-hsar-tj60

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleRedundancy allocation in distributed systems

NamePeng, Pei (author); Soljanin, Emina (chair); Sarwate, Anand (member); Spasojevic, Predrag (member); Whiting, Philip (member); Rutgers University; School of Graduate Studies

Date Created2022

Other Date2022-01 (degree)

SubjectComputer engineering, Electrical engineering, Information science, Electronic data processing -- Distributed processing

Extent145 pages : illustrations

DescriptionDistributed systems, e.g., distributed/parallel computing and distributed storage systems, have become necessary for handling ever-increasing data storage and computing requirements. Distributed systems can significantly increase the job execution speed by employing multiple computing servers to work in parallel. However, the random frustrations of the task execution times, because of many factors such as power management, software or hardware failures, maintenance, and resource sharing, often result in some workers executing their tasks much slower than the others. The existence of straggling workers leads to distributed system performance degradation. Redundancy, as simple task replication, and more recently, erasure coding, has emerged as a potentially powerful way to shorten the job execution time. It allows a job to be completed when only a subset of redundant tasks gets executed, thus avoiding stragglers.

This thesis is concerned with optimizing the redundancy allocation to reduce the job execution time in distributed systems. We address three systems: distributed storage systems, covert message passing on graphs, and distributed computing systems. Because of the different performance targets of each system, we analyze the optimal allocation based on various performance metrics.

In the first part of the thesis, we analyze the allocation of (redundant) file chunks throughout a distributed storage system, which affects crucial performance metrics such as the probability of file recovery and the service rate of the system under a data access model. We focus on quasi-uniform storage allocations where coded content is equally spread among a subset of nodes and provide a service rate analysis for two common data access models. We find that when the accessed file is small, the minimal spreading allocation is universally optimal; it results in the maximal service rate regardless of the system parameters. When the accessed file is large, the optimal allocation depends on the system parameters.

In the second part of the thesis, we introduce a gossip-like protocol for covert message passing in a wireless mobile network environment. This protocol can complement some existing methods which camouflage messages as noise. Redundancy affects two crucial performance metrics: the covertness probability and the communication delay. We characterize the dependence of these performance metrics on the redundancy level and other system parameters and conclude that a different level of redundancy is required to optimize each performance metric. We then determine the tradeoff between the covertness probability and the delay for different system parameters as a function of the introduced redundancy level.

In the third part of the thesis, we analyze the parallelism and diversity tradeoff in a distributed computing system. Distributed computing enables the parallel execution of tasks that make up a large computing job. As task replication and erasure coding, redundancy provides diversity that allows a job to be completed when only a subset of redundant tasks get executed. Thus, both redundancy and parallelism reduce the execution time but compete for the resources of the system. In situations of constrained resources (here, a fixed number of parallel servers), increasing redundancy reduces the level of parallelism. We characterize the diversity vs. parallelism tradeoff for three common models of task size-dependent execution times. We conclude that different models operate optimally at different redundancy levels, thus requiring very different code rates.

NotePh.D.

NoteIncludes bibliographical references

Genretheses

Persistent URLhttps://doi.org/doi:10.7282/t3-hsar-tj60

LanguageEnglish

CollectionSchool of Graduate Studies Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide