Parallelism-driven performance analysis techniques for task parallel programs

Yoga, Adarsh

doi:doi:10.7282/t3-8b5r-ym83

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Parallelism-driven performance analysis techniques for task parallel programs

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(1.59 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Yoga, Adarsh. Parallelism-driven performance analysis techniques for task parallel programs. Retrieved from https://doi.org/doi:10.7282/t3-8b5r-ym83

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleParallelism-driven performance analysis techniques for task parallel programs

NameYoga, Adarsh (author); Nagarakatte, Santosh (chair); Kremer, Ulrich (internal member); Kannan, Sudarsun (internal member); Musuvathi, Madanlal (outside member); Rutgers University; School of Graduate Studies

Date Created2019

Other Date2019-10 (degree)

SubjectComputer Science, Parallelism, Parallel programs (Computer programs)

Extent1 online resource (xx, 184 pages) : illustrations

DescriptionPerformance analysis of parallel programs continues to be challenging for programmers. Programmers have to account for several factors to extract the best possible performance from parallel programs. First, programs must have adequate parallel computation that is evenly distributed to keep all processors busy during execution. Second, programs must reduce secondary effects caused by interactions in hardware, which can degrade performance. Third, performance problems due to inadequate parallel computation and secondary effects can get magnified when programs are executed at scale. Fourth, programs must ensure minimal overhead from other sources like runtime schedulers, lock contention, and heavyweight abstractions in the software stack. To diagnose performance issues in parallel programs, programmers rely on profilers to obtain performance insights. Although profiling is a well-researched area, existing profilers primarily provide information on where a program spends its time. They fail to highlight the parts of the program that matter in improving the performance and the scalability of a program.

This dissertation makes a case for using logical parallelism to identify parts of the program that matter in improving the performance of task parallel programs. It makes the following contributions. First, it describes a scalable and an efficient technique to compute the logical parallelism of a program. Logical parallelism defines the speedup of a program in the limit. Logical parallelism is an ideal metric to assess if a program has adequate parallel computation to attain scalable speedup on any number of processors. Second, it introduces a novel performance model to compute the logical parallelism of a program. To enable parallelism computation, the performance model encodes the series-parallel relations in the program and fine-grain work performed in an execution. Third, it presents a technique, called what-if analyses, that uses the parallelism computation and the performance model to identify parts of a program that matter in improving the parallelism. Finally, it describes a differential analysis technique that uses the performance model to identify parts of the program that matter in addressing secondary effects.

Using the techniques proposed in this dissertation we have developed TaskProf, a profiler and an adviser for task parallel programs. The performance insights gained from TaskProf have enabled us to design concrete optimizations and increase the performance of a range of applications. The techniques presented in this dissertation and our profiler TaskProf demonstrate that by designing rich abstractions that enable analyses to measure parallelism, expose secondary effects and evaluate how performance changes when regions are optimized, one can identify parts of the program that matter in improving the performance of task parallel programs.

NotePh.D.

NoteIncludes bibliographical references

Genretheses, ETD doctoral

Persistent URLhttps://doi.org/doi:10.7282/t3-8b5r-ym83

LanguageEnglish

CollectionSchool of Graduate Studies Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide