DescriptionNowadays, the explosive data scale increase provides an unprecedented opportunity to apply machine learning methods in various application domains. The high-dimension data representation proposes curse of dimension challenge to machine learning models. Sparse model offers a tool that can alleviate this challenge by learning a low-dimension feature and model representation. Traditionally, the sparse model is learned by penalizing the $ell_1$-norm of the model parameter in optimization. Recent sparse model learning research is studying more accurate way of modeling the sparsity degree as prior knowledge, representative work includes $k$-support norm regularized minimization and $ell_0$-constrained minimization.
If the training loss is convex, minimizing the training loss with model parameter $k$-support-norm regularizer is still a convex optimization problem. In chapter 2 we introduce the proposed fully corrective Frank-Wolfe type algorithm, called $k$-FCFW, for $k$-support-norm regularized sparse model learning.
We reformulate the regularized minimization into a constrained minimization task, then the the proposed algorithm is applied to solve the reformulated problem. In this work we compare the per-iteration complexity of the proposed $k$-FCFW algorithm with proximal gradient algorithm, which is conventionally used to solve the original problem. One theoretical contribution is that we establish a linear convergence for the proposed algorithm under some standard assumptions.
The model parameter $ell_0$-norm can be directly used as constraint in the learning objective.
However, in this condition even if training loss is convex, the problem is non-convex and NP-hard because of the $ell_0$-norm constraint. To obtain a tradeoff between model solving accuracy and efficiency, several primal domain greedy algorithms have been proposed. The algorithm properties such as model estimation error upper bound and support recovery are analyzed in literature. In chapter 3 we introduce the proposed dual space algorithm for the sparsity-constrained $ell_2$-norm regularized finite sum loss minimization problem. The sparse duality theory established in this work sets up the sufficient and necessary conditions under which the original non-convex problem can be equivalently solved in a concave dual formulation. The dual iterative hardthresholding (DIHT) algorithm and its stochastic variant are proved to be able to recover the parameter support without the Restricted Isometry Property (RIP) condition.
Distributed optimization algorithm is used for learning a global optimal model when training samples locate on different machines. Communication efficiency is one important concern in distributed model training algorithm design. Chapter 4 elaborates the proposed Newton-type inexact pursuit algorithm for the $ell_0$-constrained empirical risk minimization problem. The proposed algorithm iterates between inexactly solving a local sparse learning problem with existing single machine algorithms, and communicating gradient and model parameter between master machine and worker machines.
Algorithm analysis shows the model estimation error upper bound has linear convergence rate.
In the last part, we conclude this dissertation and present the future direction of sparse model learning research.