Zhang, Yayan. Data normalization and clustering for big and small data and an application to clinical trials. Retrieved from https://doi.org/doi:10.7282/T3X068WQ
DescriptionThe purpose of this thesis is to propose new methodology for data normalization and cluster prediction in order to help us unravel the structure of a data set. Such data may come from many different areas, for example clinical responses, genomic multivariate data such as microarray, educational test scores, and so on. In addition and more specifically for clinical trials this thesis proposes a new cohort size adaptive design method that will adapt cohort size eventually and finally will save time and cost while still keep the accuracy to find the target maximum tolerate dose. The new normalization method is called Fishe-Yates normalization and it has the advantage of being computationally superior than the standard quantile normalization and it improved the power of the following statistical analysis. Once the data has been normalized the observations are clustered by their pattern of response and cluster prediction is used to validate the findings. We propose a new method for cluster prediction which is a natural way to predict for hierarchical clustering. Our prediction method using nonlinear boundaries between clusters. Normalization method and clustering prediction method can help to identify subgroups of patients which has positive treatment effect. For clinical trial study, this thesis also proposes a new adaptive design which will adapt cohort size thus save time and cost to locate the target maximum tolerated dose.