Description
TitleFast methods to detect and debug numerical errors with shadow execution
Date Created2022
Other Date2022-10 (degree)
Extent154 pages : illustrations
DescriptionThe floating-point (FP) representation uses a finite number of bits to approximate real numbers in computer systems. Due to rounding errors, arithmetic using the FP representation may diverge from arithmetic with real numbers. For primitive FP operations, the rounding error is small and bounded. However, with the sequence of FP operations, rounding errors can accumulate. Such rounding errors can be magnified by certain operations. The magnified error can affect the program’s control flow and the output compared to execution with infinite bits of precision. It is challenging for users to identify such bugs because the program does not crash but generates the incorrect output. Without any oracle in high precision, it is hard to differentiate between correct and incorrect output. Detecting such bugs in long-running programs becomes even more challenging.This dissertation proposes a fast yet precise mechanism to detect and debug numerical errors in long-running programs. This dissertation makes the following contributions: First, we propose a selective shadow execution framework to detect and debug numerical errors. Our idea is to use shadow execution with high-precision computation for comprehensive numerical error detection. On every FP computation, an equivalent high precision computation is performed. If there is a significant difference between FP computation and high precision computation, the error is reported to the user. We use additional information about instructions to generate a directed acyclic graph (DAG) of them showing the error propagation. The DAG helps the user identify the error’s root-cause. Our prototype FPSanitizer for floating-point is an order of magnitude faster than prior work.
Second, we propose a novel technique to run shadow execution in parallel to further reduce performance overheads. In our approach, the user specifies parts of the program that need to be debugged. Our compiler creates shadow execution tasks that mirror these specified regions in the original program but perform equivalent high precision computation. To execute the shadow tasks in parallel, we break the dependency between them by providing the appropriate memory state and input. Moreover, to correctly detect the numerical errors in the original program, shadow tasks must follow the same control flow as the original program. Our key insight is to use FP values computed by the original program to start the shadow tasks from an arbitrary point in time. To ensure they follow the same control flow as the original program, our compiler updates every branch instruction in the shadow task to use the branch outcomes of the original program. As a result, the original program and shadow tasks execute in a decoupled fashion and communicate via a non-blocking queue. Our prototype PFPSANITIZER is significantly faster than the FPSANITIZER. On average, PFPSANITIZER provides a speedup of 30.6× speedup over FPSanitizer with 64 cores.
Finally, we propose an alternative lightweight oracle to reduce the overheads of shadow execution. Executing the shadow execution on multiple cores in parallel reduces the performance overheads. However, the user must identify regions of the program to enable shadow execution in parallel. Often, users may not know where the numerical bugs are present. This thesis proposes a fast shadow execution framework, EFTSANITIZER, that uses error-free transformations (EFTs) to detect and debug numerical bugs comprehensively. EFTs are a set of algorithms that provide a mechanism to capture the rounding error using primitive FP instructions. For certain FP computations rounding error can be represented as an FP value. Based on this observation, EFTs transform FP computation to derive the rounding error. In our approach, we maintain the error with each memory location and compute the error for a sequence of FP operations using error composition with EFTs. EFTSANITIZER provides a trace of instructions that help users isolate and debug the root cause of the errors. In addition, EFTSANITIZER is an order of magnitude faster (14.72×) than FPSANITIZER. In our experimental studies, EFTSANITIZER detected all errors as detected by FPSANITIZER. However, the error reported by EFTSANITIZER may not be as precise as reported by FPSANITIZER due to loss of precision with error accumulation.
NotePh.D.
NoteIncludes bibliographical references
Genretheses
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.