DescriptionThis dissertation studies two important problems that arise in the analysis of Big Data: high dimensionality and massive size of pertinent samples. In Chapter 1, we developed three novel algorithms for clustering and classification of Big Data. First, a novel two-way clustering approach that combines model-based and weighted K-means clustering methods. The two-way approach results in smaller subgroups of binary features that are of size p or less, so that the possible number of patterns is small enough to be efficiently handled by traditional clustering algorithms. This approach can also handle weighted reduced data to scale on massive sample sizes. In Chapter 2, we derived a weighted probabilistic distance-based clustering technique adjusted for cluster sizes. Chapter 3 introduces an ensemble method called Enriched Random Forest for high dimensional data (where n << p, n is the number of observations and p are the features). This algorithm can address situations where the dimension is very high but only a very small fraction of these features is truly informative.