TY - JOUR TI - Optimization models and algorithms for sample-preserved classification and clustering DO - https://doi.org/doi:10.7282/T3M61KBN PY - 2010 AB - This dissertation presents the development of new optimization models and algorithms for sample-preserved classification and clustering. A sample-preserved method keeps some or all of the existing samples when training a rule for classification or clustering, and continues to use them in the testing or predicting phase. Developing a sample-preserved method provides the capability of analyzing time series data due to the largely applied similarity measures on time series. A proposed sample-preserved classification technique, called Support Feature Machine (SFM), finds an optimal combination of features that gives the best classification based on the nearest neighbor rule. It keeps all baseline samples of the selected features in the predicting phase. Variations of SFM models are also presented. In addition, the bilinear program sample-preserved k-median (BPSPKM) clustering algorithm is introduced. While the original k-median problem can be solved by a simple and efficient bilinear program algorithm, it does not have the sample-preserved property, and only works with the 1-norm distance. The sample-preserved k-median (SPKM) clustering method is formulated as an integer programming problem, which is very hard to solve. A bilinear program algorithm is herein proposed in order to obtain local optimal solutions of the SPKM clustering method, as well as a new sequential search algorithm that can solve the SPKM clustering more efficiently. Finally, a novel feature space sample-preserved k-median (FSSPKM) clustering algorithm is proposed, as well as feature selection methods tailor made for such clustering technique. The experimental results show that the original k-median clustering fails to classify time series data due to the lack of the sample-preserved property, and the utilization of time series similarity measures. The sample-preserved medians can avoid having invalid values in some application domains and can be used to represent the samples in the clusters. The BPSPKM clustering algorithm with the Euclidean distance is suggested for clustering attribute (non-time series), univariate time series and multivariate time series data sets. Furthermore, the proposed feature selection methods consider the distances between cluster centers and cluster densities. The results show that the proposed algorithms outperform other feature selection techniques used in the original k-median methods. KW - Industrial and Systems Engineering KW - Classification--Mathematical models KW - Algorithms LA - eng ER -