Description
TitlePerformance profilers and debugging tools for openMP applications
Date Created2021
Other Date2021-01 (degree)
Extent1 online resource (xvi, 222 pages)
DescriptionOpenMP is a popular application programming interface (API) used to write shared-memory parallel programs. It supports a wide range of parallel constructs to express different types of parallelism, including fork-join and task-based parallelism. Using OpenMP, developers can incrementally parallelize a program by adding parallelism to it until their performance goals are met.
In this dissertation, we address the problem of assisting developers in meeting the two primary goals of writing parallel programs in OpenMP: performance and correctness. First, writing OpenMP programs that achieve scalable performance is challenging. An OpenMP program that achieves reasonable speedup on a low core count system may not achieve scalable speedup when ran on a system with a larger number of cores. Traditional profilers report program regions where significant serial work is performed. In a parallel program, optimizing such regions may not improve performance since it may not improve its parallelism. To address this problem, we introduce OMP-Adviser, a parallelism-centric performance analysis tool for OpenMP programs with what-if analyses. We propose a novel OpenMP series-parallel graph (OSPG) that precisely captures the series-parallel relations between different fragments of the program's execution. The OSPG, along with fine-grained measurements, constitute OMP-Adviser's performance model. OMP-Adviser identifies serialization bottlenecks by measuring inherent parallelism in the program and its OpenMP constructs. OMP-Adviser's what-if analysis assists developers to identify regions that have to be optimized first for the program to achieve scalable speedup. While a lack of inherent parallelism is a sufficient condition for a program not to have scalable speedup, too much parallelism can lead to excessive runtime and scheduling overheads, resulting in lower performance. We address this issue by extending OMP-Adviser to measure tasking overheads. By attributing this additional information to different OpenMP constructs in the program, OMP-Adviser can identify tasking cut-offs that achieve the right balance between a program's parallelism and its scheduling overheads for a given input. Further, we design a differential analysis technique that enables OMP-Adviser to identify program regions experiencing scalability bottlenecks caused by secondary effects of execution.
Second, writing correct parallel programs is challenging due to the possibility of bugs such as deadlocks, livelocks, and data races that do not manifest when writing a serial program. A data race occurs when two parallel fragments of the program access the same memory location while one of the accesses is a write. Data races are common in OpenMP applications. Due to the exponential number of possible instruction and thread interleaving in parallel applications, identifying and reproducing data races using manual or automated testing techniques is impractical. Hence, specialized data race detection tools have been proposed to identify data races. We propose OMP-Racer, a novel apparent data race detector for OpenMP that supports a large call of OpenMP applications. To detect data races, OMP-Racer constructs the program's OSPG to encode the logical series-parallel relation for different serial fragments of the program. By capturing the program's logical series-parallel relations, OMP-Racer can detect data races in possible thread interleavings of the program for a given input. Thus, alleviating the need for schedule exploration. Compared to other OpenMP data race detectors, OMP-Racer can correctly identify races in a larger subset of OpenMP that use task-dependencies and locks, while achieving similar overheads.
Our results with testing the OMP-Adviser and OMP-Racer prototypes with 40 OpenMP applications and benchmarks indicate their respective effectiveness in performance analysis and data race detection. Furthermore, it demonstrates the usefulness of our proposed OSPG data structure in enabling the creation of different types of analysis tools for OpenMP applications.
NotePh.D.
NoteIncludes bibliographical references
Genretheses, External ETD doctoral
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.