Description
TitleDeep compression for random features
Date Created2021
Other Date2021-05 (degree)
Extent1 online resource (xvii, 237 pages)
DescriptionWith the rapid development of modern technologies, ``big data'' problems involving massive datasets have become more and more common in machine learning applications. The challenges of such large datasets call for the need to design efficient strategies for large-scale machine learning and data storage. In this thesis, we will study the data compression methods with randomized sketching techniques, from linear to non-linear methods.
In the linear case, we will consider the method of random projection (RP) and its compression method achieved by quantization. Firstly, we provide thorough statistical analysis on the cosine estimators given by quantized random projections (QRPs). By allowing two data vectors to be quantized by different precision, our analysis generalizes to various practical scenarios. Under Lloyd-Max (LM) quantization scheme, simple inner product estimator, normalized estimator and Maximum Likelihood Estimator (MLE) are studied and compared, and the correctness of the estimators is justified. Our theoretical and empirical results reveal the connection between the variance of debiased cosine estimators with similarity search accuracy.
We then consider the privacy of compressed linear sketches by noise addition. Based on an information-theoretic approach, we demonstrate how to find the optimal noise added to the QRPs that minimizes the mutual information. We then propose differentially private (DP) methods for RP and QRP release with smaller noise magnitude compared with classical Gaussian mechanism, and show the improvement of QRP over RP in terms of differential privacy. Utility of Euclidean estimation using RPs and QRPs is also provided.
Next, we consider a new application of QRPs where the quantized linear sketches are used to train machine learning models. We study three models: nearest neighbor classification, linear classification and linear regression, where generalization error bounds are analyzed. Our analysis provides guidance on the proper choice of quantizers for different models in practice. We show that for linear classification and regression, the Lloyd-Max (LM) quantization is recommended.
In Part II of this dissertation, we consider approximate non-linear learning. Firstly, we demonstrate an example on how the method of random Fourier features (RFF) can be used in approximate kernel learning, by applying it to the Kernel Multi-view Discriminant Analysis (KMvDA) problem. Theoretical analysis on the perturbation bounds and numerical experiments are provided, showing the efficacy of RFFs in practice.
Then, we design two distortion optimal quantizers for RFF under Lloyd-Max (LM) scheme, based on our analysis on the probability distribution of RFF. The method is called LM-RFF. Detailed statistical analysis is provided, and experiments illustrate superior learning performance of LM-RFF over stochastic rounding technique. The proposed LM-RFF can reach the accuracy of full-precision RFF with high compression ratio, significantly reducing the memory cost in large-scale systems. Lastly, we propose a strategy to address the potential limitations of LM-RFF, by directly extracting non-linear sketches from QRPs (compressed linear measurements). The problem is highly efficient since we only need to store one set of compressed QRPs in the database, for both linear and non-linear learning. Moreover, it also provides a solution to very practical scenarios. Kernel estimation and uniform approximation error bound are provided. Empirical results confirm that it can be a convenient alternative for RFF compression in practice.
NotePh.D.
NoteIncludes bibliographical references
Genretheses, ETD doctoral
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey