Anzanello, Michel Jose. Selecting the best process variables for classification of production batches into quality levels. Retrieved from https://doi.org/doi:10.7282/T3B56JXX
DescriptionIn chemical and industrial processes hundreds of noisy and correlated process variables are collected for process monitoring. This volume of data makes it hard to identify the key variables. Typically, the goal has been to identify important process variables and create a model using these to predict product variables.
The main contribution of this research is a methodology to identify the process variables leading to the best categorization of production batches into classes regarding some final specification, like quality or profitability.
The proposed method, Variable-k-Pareto (VKP), applies a multivariate regression to historical data to establish relations between the process and product variables. Parameters of the multivariate regression generate indices that rank the process variables according to their importance for classification purposes. A data mining technique is then applied on the process variables, classification performance is evaluated, and the variable ranked as the least important is eliminated. This classification/elimination procedure is repeated until a lower bound of remaining variables is reached. A graph relating classification performance and percent of retained variables is generated and the Pareto-Optimal analysis identifies the best subset of variables for classification purposes.
The method performs remarkably well on real and simulated data, retaining only average 5 to 10 percent of the original variables, while significantly increasing the classification performance. We also test alternative classification techniques, as Probabilistic Neural Network and Support Vector Machine, but the k-Nearest Neighbor performs better.
The VKP method is easily adapted to situations where several classification performance measures, e.g., sensitivity and specificity, are needed for batch classification. The cost of collecting variables can also be used. The method also performs well when the product variable falls into more than two quality classes.
Another contribution is the improvement of the VKP method for variable selection in batches where the product variable is located near the cut off limit. Such product variables are highly affected by measurement error, reducing the precision of batch classification. The resulting method shows that batches with product variable near the cut off limit require a particular subset of process variables for classification, different from the subset for the remaining batches.