Algorithm-hardware co-design for energy-efficient deep neural networks

Deng, Chunhua

doi:doi:10.7282/t3-96yq-hr17

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Algorithm-hardware co-design for energy-efficient deep neural networks

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(9.85 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Deng, Chunhua. Algorithm-hardware co-design for energy-efficient deep neural networks. Retrieved from https://doi.org/doi:10.7282/t3-96yq-hr17

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleAlgorithm-hardware co-design for energy-efficient deep neural networks

NameDeng, Chunhua (author); Bo, Yuan (chair); Wei, Sheng (internal member); Najafizadeh, Laleh (internal member); Yu, Jingjin (outside member); Rutgers University; School of Graduate Studies

Date Created2021

Other Date2021-05 (degree)

SubjectElectrical and Computer Engineering, VLSI, Neural networks (Computer science)

Extent1 online resource (xv, 120 pages) : illustrations

DescriptionDeep Neural Networks (DNNs) have achieved unprecedented success in various applications like autonomous vehicles, speech recognition, etc. However, this success is at the cost of the huge model size, typically with hundreds of layers and millions or even billions of parameters. It is very difficult to deploy these deep neural networks on embedded devices with limited hardware resources and a tight power budget. Therefore, it is imperative to reduce the model size of deep neural networks to save both storage and computation without sacrificing their performance. To date, there are many model compression methods, however, most of them are hardware unfriendly. This thesis focuses on hardware-friendly algorithm and domain-specific hardware architecture to tackle these challenging issues. To overcome the irregularity of conventional model compression method or the complex computation incurred by the regular dense matrix, we introduce PermDNN and PermCNN. They compress the neural network with permuted diagonal matrices with highly energy-efficient hardware accelerator. Tensor-train decomposition (TTD) has a much higher compression ratio than traditional methods. However, the naive TTD computation incurs a lot of redundant computation. We propose the compact form TTD calculation which removes all the redundant computation, and a dedicated hardware-architecture accelerates this compact form TTD calculation. Considering the universal availability of the sparse neural network models, it is also important to improve the energy-efficiency of sparse DNN hardware accelerators. However, the state-of-the-art sparse CNN accelerators suffer from redundant computation, high-cost intersection, and little data reuse. GoSPA improves energy efficiency by utilizing of the property of static weight stream and the computation reordering. The evaluation shows that all of the PermDNN, PermCNN, TIE, and GoSPA achieve several times higher energy-efficient than the state-of-the-art.

NotePh.D.

NoteIncludes bibliographical references

Genretheses, ETD doctoral

Persistent URLhttps://doi.org/doi:10.7282/t3-96yq-hr17

LanguageEnglish

CollectionSchool of Graduate Studies Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide