DescriptionDeep Neural Networks (DNNs) have achieved unprecedented success in various applications like autonomous vehicles, speech recognition, etc. However, this success is at the cost of the huge model size, typically with hundreds of layers and millions or even billions of parameters. It is very difficult to deploy these deep neural networks on embedded devices with limited hardware resources and a tight power budget. Therefore, it is imperative to reduce the model size of deep neural networks to save both storage and computation without sacrificing their performance. To date, there are many model compression methods, however, most of them are hardware unfriendly. This thesis focuses on hardware-friendly algorithm and domain-specific hardware architecture to tackle these challenging issues. To overcome the irregularity of conventional model compression method or the complex computation incurred by the regular dense matrix, we introduce PermDNN and PermCNN. They compress the neural network with permuted diagonal matrices with highly energy-efficient hardware accelerator. Tensor-train decomposition (TTD) has a much higher compression ratio than traditional methods. However, the naive TTD computation incurs a lot of redundant computation. We propose the compact form TTD calculation which removes all the redundant computation, and a dedicated hardware-architecture accelerates this compact form TTD calculation. Considering the universal availability of the sparse neural network models, it is also important to improve the energy-efficiency of sparse DNN hardware accelerators. However, the state-of-the-art sparse CNN accelerators suffer from redundant computation, high-cost intersection, and little data reuse. GoSPA improves energy efficiency by utilizing of the property of static weight stream and the computation reordering. The evaluation shows that all of the PermDNN, PermCNN, TIE, and GoSPA achieve several times higher energy-efficient than the state-of-the-art.