Turkoz, Mehmet. Distribution-free fault identification and anomaly detection in high-dimensional data. Retrieved from https://doi.org/doi:10.7282/T3XS5ZV3
DescriptionQuality engineering is an essential activity in production processes and its objective is to ensure the quality of the products throughout the production stages. Many processes have several attributes that need to be continuously monitored to detect any variable changes in the production process. We refer to the monitoring process with several quality characteristics as multivariate statistical process control (MSPC). Most of the quality control procedures assume that the characteristics of the process follow normal distributions; however, this is a limiting assumption since the underlying distribution of the processes may not be normal. In this dissertation, we present procedures to identify the faulty variables and detect anomalies in MSPC with high dimensional data when the underlying distribution of the process is unknown. We first propose a distribution-free adaptive step-down (DFASD) procedure, which is motivated by a well-known data description method called support vector data description (SVDD). This data description procedure includes the support vectors which identify the hypersphere boundary for the available data by using the kernel concept. In a high-dimensional process, identifying the variable or a subset of variables, which cause an out-of-control (OC) signal, is a challenging issue in quality engineering. DFASD procedure utilizes conditional statistics for the identification of faulty variables. The proposed DFASD procedure selects a variable having no significant evidence of a change at each step based on the variables that are selected in the previous steps. The proposed DFASD stops when there are no longer variables to classify to the unchanged set. Therefore, it concludes the variables which are not in the unchanged set as changed variables. We then present a new distribution-free fault identification procedure based on Bayesian inference which is called Bayesian SVDD (BSVDD). While the traditional SVDD assumes that the process parameters are constants to be determined, the center of hypersphere may be considered as a random vector with inherent randomness based on a given training dataset. We introduce a Bayesian approach for SVDD by assuming that a transformed data into the higher dimensional space follow normal distribution. A distance from a point to the center of the hypersphere is inversely proportional to the likelihood in the proposed model. This is because SVDD is a special case of the proposed BSVDD model, which improves SVDD by utilizing the precise prior knowledge. Therefore, by combining proposed BSVDD with an adaptive step-down procedure, we drive a new BSVDD based fault identification procedure for the MSPC. This is the first research to identify the faulty variables by using the distribution-free approach based on Bayesian inference. We also present an anomaly detection procedure which is easily applicable in detecting anomalies in multimode processes. Traditional quality control procedures assume that normal observations are obtained from a single distribution. However, due to the complexities of modern industrial processes, the observations might have multiple operating modes. In other words, normal observations may be obtained from more than one distribution. In such cases, conventional quality control procedures might trigger false alarms while the process is indeed in another operating mode. We propose a generalized support vector-based anomaly detection procedure called n-class SVDD which can be used to determine the anomalies in multimode processes. The proposed procedure constructs n hyperspheres by considering the relationship among modes. In addition, we introduce a generalized Bayesian framework by not only considering the prior information from each mode but also the relationships among the modes. Finally, we present a new Bayesian procedure for anomaly detection in multi-class data. The existing procedures for anomaly detection mostly take only the normal information into account. However, the anomaly information is often available from the engineering knowledge and the historical data of the process. The performance of the anomaly detection procedures can be improved when available anomaly data are utilized to obtain data description. We propose a multi-class Bayesian SVDD model that takes anomaly data into consideration when the anomaly data are available and an appropriate prior distribution of the anomaly data is obtained.