DescriptionPerformance analysis of parallel programs continues to be challenging for programmers. Programmers have to account for several factors to extract the best possible performance from parallel programs. First, programs must have adequate parallel computation that is evenly distributed to keep all processors busy during execution. Second, programs must reduce secondary effects caused by interactions in hardware, which can degrade performance. Third, performance problems due to inadequate parallel computation and secondary effects can get magnified when programs are executed at scale. Fourth, programs must ensure minimal overhead from other sources like runtime schedulers, lock contention, and heavyweight abstractions in the software stack. To diagnose performance issues in parallel programs, programmers rely on profilers to obtain performance insights. Although profiling is a well-researched area, existing profilers primarily provide information on where a program spends its time. They fail to highlight the parts of the program that matter in improving the performance and the scalability of a program.
This dissertation makes a case for using logical parallelism to identify parts of the program that matter in improving the performance of task parallel programs. It makes the following contributions. First, it describes a scalable and an efficient technique to compute the logical parallelism of a program. Logical parallelism defines the speedup of a program in the limit. Logical parallelism is an ideal metric to assess if a program has adequate parallel computation to attain scalable speedup on any number of processors. Second, it introduces a novel performance model to compute the logical parallelism of a program. To enable parallelism computation, the performance model encodes the series-parallel relations in the program and fine-grain work performed in an execution. Third, it presents a technique, called what-if analyses, that uses the parallelism computation and the performance model to identify parts of a program that matter in improving the parallelism. Finally, it describes a differential analysis technique that uses the performance model to identify parts of the program that matter in addressing secondary effects.
Using the techniques proposed in this dissertation we have developed TaskProf, a profiler and an adviser for task parallel programs. The performance insights gained from TaskProf have enabled us to design concrete optimizations and increase the performance of a range of applications. The techniques presented in this dissertation and our profiler TaskProf demonstrate that by designing rich abstractions that enable analyses to measure parallelism, expose secondary effects and evaluate how performance changes when regions are optimized, one can identify parts of the program that matter in improving the performance of task parallel programs.