TY - JOUR TI - New nonparametric approaches for multivariate and functional data analysis in outlier detection, construction of tolerance tubes, and clustering DO - https://doi.org/doi:10.7282/T3HH6N76 PY - 2016 AB - Recent advances of powerful computing and data acquisition technologies have made large complex datasets ever-present, including high-dimensional or functional data. Most existing statistical approaches for multivariate or functional data rely on parametric assumptions such as normality. In reality, such assumptions are either difficult to justify or verify. The goal of this dissertation is to develop general nonparametric statistical approaches for outlier detection, tolerance tubes construction, and clustering for multivariate and functional data. i) In Chapter 3, we propose a general approach named Antipodal Reflection Depth (ARD), to refine any existing function of depth (henceforth base depth) to form a class of new depth functions. ARD has the advantage over its base depth in capturing the relative magnitude of deviation from all data points to the deepest one. This desirable property is key in making ARD particularly useful in many applications. Here, we focus primarily on its utility in outlier detection. ii) In Chapter 4, we introduce tolerance tubes, which can be viewed as generalizations of tolerance intervals/regions to functional settings. A tolerance tube ensures a specified portion of the functional dataset be contained within the tolerance limits with some level of confidence. In addition to extending the commonly accepted definitions of $beta-$content and $beta-$expectation, we introduce modifications by incorporating an exempt level $alpha$. The latter relaxes the definitions by allowing $alpha$ portion of each functional to be exempt from the requirements and is thus particularly useful to offset allowable occasional aberrations. iii) In Chapter 5, we propose a new clustering approach named K-means on Pairwise Distance (KMPD), and show it to be effective in detecting clusters with different sizes. Moreover, KMPD has the capability of grouping anomalous sample points into a single cluster, and therefore is an effective approach for outlier detection as well. All these approaches are completely non-parametric and data-driven, and thus can be broadly applicable. Relevant theoretical properties are investigated and justified. These approaches are also illustrated and tested using data from both simulations and a real application on a medical study of continuous glucose monitoring. KW - Statistics and Biostatistics KW - Outliers (Statistics) LA - eng ER -