DescriptionDeep Neural Network (DNN) has achieved great success in many fields. However, many DNN models are both deep and large thereby causing high storage and energy consumption during the training and inference phases. As the size of DNNs continues to grow, it is critical to improve computation efficiency and energy consumption while maintaining the corresponding model performance. Various methods have been proposed for compressing DNN models, which can be categorized into three different levels, model level, structure level, and weight level. This thesis focuses on structure enforcing compression algorithm and embedding quantization method which aims at:i)less storage and computation complexity, ii)easier hardware implementation because of structured memory access pattern, iii)natural language processing oriented embedding binarization. The first chapter introduces the motivation of this dissertation in detail. Chapter 2 goes over the background and the related work about compressing deep neural network. Chapter 3, Chapter 4 and Chapter 5 presents proposed compression methods for fully connected layer, convolution layer and embedding layer. Final chapter 6 discusses possible future directions of this research.