Xia, Yi. Extended bootlier procedure for detection of outliers in univariate samples and linear regression analysis. Retrieved from https://doi.org/doi:10.7282/T39Z96H5
DescriptionDetermining if a dataset has one or more outliers is a fundamental and challenging problem in statistical analysis. This dissertation introduces a statistical framework that addresses two well-known problems in the outlier analysis. The first problem (Problem 1) is to detect outliers in independent and identically distributed univariate samples, which is the basic setting of outlier problem. The second problem (Problem 2) is to detect outliers and influential observations in the linear regression analysis, which is a major topic in linear regression model diagnostics and represents a more complete setting. The proposed framework is motivated by a graphic outlier detection method proposed recently for Problem 1. It is observed in bootstrapping that some bootstrap samples contain outliers while others do not, when outliers are present in a sample. Based on this observation, the method discovers that a bootstrap sample statistic (termed “mean – trimmed mean”) is sensitive to outliers, and particularly its histogram is multimodal in the presence of outliers. Consequently outliers are detected by plotting and visually checking the histogram. Considering that method captures the essence of outliers that the researches often call, the proposed framework further develops it to a complete inference procedure by constructing a formal statistical test based on a quantitative index that measures the degree of outliers effect. The proposed framework is first developed to address Problem 1. A procedure with a formal test is detailed and the large sample theory is developed to support the proposed procedure. Then, the procedure is extended to linear regression to address Problem 2. The measures for outliers and influential observations, including several residuals and a square-root version of Cook’s distance, are discussed, and large sample theory is developed for such non-independent case. In addressing both problems, the simulation studies are conducted and real data examples are explored to show the wide-range application of the proposed framework. In particular, the comparison with other commonly used methods in the simulation studies demonstrates the overall advantage of the proposed framework.