Skip to feedback

Award Abstract # 1763747
SHF: Medium: Training Sparse Neural Networks with Co-Designed Hardware Accelerators: Enabling Model Optimization and Scientific Exploration

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF SOUTHERN CALIFORNIA
Initial Amendment Date: July 2, 2018
Latest Amendment Date: June 22, 2020
Award Number: 1763747
Award Instrument: Continuing Grant
Program Manager: Almadena Chtchelkanova
achtchel@nsf.gov
 (703)292-7498
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2018
End Date: June 30, 2023 (Estimated)
Total Intended Award Amount: $1,199,849.00
Total Awarded Amount to Date: $1,199,849.00
Funds Obligated to Date: FY 2018 = $758,600.00
FY 2020 = $441,249.00
History of Investigator:
  • Keith Chugg (Principal Investigator)
    chugg@usc.edu
  • Peter Beerel (Co-Principal Investigator)
  • Leana Golubchik (Co-Principal Investigator)
  • Panayiotis Georgiou (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of Southern California
3720 S FLOWER ST FL 3
LOS ANGELES
CA  US  90033
(213)740-7762
Sponsor Congressional District: 34
Primary Place of Performance: University of Southern California
3740 McClintock Avenue
Los Angeles
CA  US  90089-2565
Primary Place of Performance
Congressional District:
37
Unique Entity Identifier (UEI): G88KLJR3KYT5
Parent UEI:
NSF Program(s): Special Projects - CCF,
Software & Hardware Foundation
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 075Z, 7924, 7942
Program Element Code(s): 287800, 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Machine learning systems are critical drivers of new technologies such as near-perfect automatic speech recognition, autonomous vehicles, computer vision, and natural language understanding. The underlying inference engine for many of these systems is based on neural networks. Before a neural network can be used for these inference tasks, it must be trained using a data corpus of known input-output pairs. This training process is very computationally intensive with current systems requiring weeks to months of time on graphic processing units (GPUs) or central processing units in the cloud. As more data becomes available, this problem of long training time is further exacerbated because larger, more effective network models become desirable. The theoretical understanding of neural networks is limited, so experimentation and empirical optimization remains the primary tool for understanding deep neural networks and innovating in the field. However, the ability to conduct larger scale experiments is becoming concentrated with a few large entities with the necessary financial and computational resources. Even for those with such resources, the painfully long experimental cycle for training neural networks means that large-scale searches and optimizations over the neural network model structure are not performed. The ultimate goal of this research project is to democratize and distribute the ability to conduct large scale neural network training and model optimizations at high speed, using hardware accelerators. Reducing the training time from weeks to hours will allow researchers to run many more experiments, gaining knowledge into the fundamental inner workings of deep learning systems. The hardware accelerators are also much more energy efficient than the existing GPU-based training paradigm, so advances made in this project can significantly reduce the energy consumption required for neural network training tasks.

This project comprises an interdisciplinary research plan that spans theory, hardware architecture and design, software control, and system integration. A new class of neural networks that have pre-defined sparsity is being explored. These sparse neural networks are co-designed with a very flexible, high-speed, energy-efficient hardware architecture that maximizes circuit speed for any model size in a given Field Programmable Gate Array (FPGA) chip. This algorithm-hardware co-design is a key research theme that differentiates this approach from previous research that enforces some sparsity during the training process in a manner incompatible with parallel hardware acceleration. In particular, the proposed architecture operates on each network layer simultaneously, executing the forward- and back-propagation in parallel and pipelined fully across layers. With high precision arithmetic, a speed-up of about 5X relative to GPUs is expected. Using log-domain arithmetic, these gains are expected to increase to 100X or larger. Software and algorithms are being developed to manage multiple FPGA boards, simplifying and automating the model search and training process. These algorithms exploit the ability to reconfigure the FPGAs to trade speed for accuracy, a capability lacking in GPUs. These software tools will also serve as a bridge to popular Python libraries used by the machine learning community.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 22)
Beauchamp, Daniel and Chugg, Keith M "Linearization for High-Speed Current-Steering DACs Using Neural Networks" IEEE 12th Latin America Symposium on Circuits and Systems , 2021 Citation Details
B. Song, M. Paolieri "Performance and Revenue Analysis of Hybrid Cloud Federations with QoS Requirements" IEEE Cloud 2022 , 2022 https://doi.org/10.1109/CLOUD55607.2022.00055 Citation Details
Chen, C.-L. and Golubchik, Leana and Paolieri, Marco "Backdoor Attacks on Federated Meta-Learning" 34th Conference on Neural Information Processing Systems , 2020 Citation Details
C.-L. Chen, S. Babakniya "Defending Against Poisoning Backdoor Attacks on Federated Meta-Learning" ACM transactions on intelligent systems and technology , 2022 Citation Details
Datta, Gourav and Kundu, Souvik and Jaiswal, Akhilesh R. and Beerel, Peter A. "ACE-SNN: Algorithm-Hardware Co-design of Energy-Efficient & Low-Latency Deep Spiking Neural Networks for 3D Image Recognition" Frontiers in Neuroscience , v.16 , 2022 https://doi.org/10.3389/fnins.2022.815258 Citation Details
Dey, Sourya and Babakniya, Sara and Kanala, Saikrishna C and Paolieri, Marco and Golubchik, Leana and Beerel, Peter A and Chugg, Keith M "Deep-n-Cheap: An Automated Efficient and Extensible Search Framework for Cost-Effective Deep Learning" SN computer science , v.2 , 2021 Citation Details
Dey, Sourya and Huang, Kuan-Wen and Beerel, Peter A. and Chugg, Keith M. "Pre-Defined Sparse Neural Networks With Hardware Acceleration" IEEE Journal on Emerging and Selected Topics in Circuits and Systems , v.9 , 2019 10.1109/JETCAS.2019.2910864 Citation Details
Fayyazi, Arash A. and Kundu, Souvik and Nazarian, Shahin and Beerel, Peter and Pedram, Massoud "CSrram: Area-Efficient Low-Power Ex-Situ Training Framework for Memristive Neuromorphic Circuits Based on Clustered Sparsity" 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) , 2019 10.1109/ISVLSI.2019.00090 Citation Details
Gourav Datta, Peter Beerel "Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking Neural Networks?" DATE , 2022 Citation Details
Gourav Datta, Souvik Kundu "Training Energy-Efficient Deep Spiking Neural Networks with Single-Spike Hybrid Input Encoding" IJCNN , 2022 Citation Details
Kundu, Souvik A. and Prakash, Saurav M. and Akrami, Haleh and Beerel, Peter and Chugg, Keith "pSConv: A Pre-defined Sparse Kernel Based Convolution for Deep CNNs" 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton) , 2019 10.1109/ALLERTON.2019.8919683 Citation Details
(Showing: 1 - 10 of 22)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

We developed new methods for advancing the training of large neural networks using application specific hardware.  This contrasts with the conventional approach of using general purpose processors, such as graphics processing units (GPUs). 

Our work was the first to introduce the concept of pre-defined, structured sparsity.  Conventional neural networks have fully-connected or dense architectures, but after training, previous research has shown that many of these connections can be disregarded.  This results in sparse connectivity pattern that reduce complexity, but do not map well to custom, highly parallel circuit architectures.  Our work demonstrated that one can pre-define a structured sparse connection pattern and still maintain excellent learning performance.  We also showed how one can co-design such a pre-defined, structured sparsity pattern with a highly parallel circuit architecture for training neural networks.  This has the potential to significantly reduce the energy cost of large-scale training and/or enable embedded systems to train large neural networks at the edge.  

Our work also explores automated model search and training hyper-parameter optimization.  We released an open-source software package for broader use in the research and industry communities.  

Our work also conducted some of the earliest work in log-number system (LNS) computational approaches for training neural networks, which eliminate costly multiplier circuits.  These LNS approaches have the potential to reduce the area and/or energy consumption of training circuitry by a factor of two.

Both pre-defined, structured sparsity and LNS approaches have become widely studied topics in the machine learning field and have received significant uptake in industry.  

Beyond the research component of our project, this collaboration led to significant advances in the curriculum at the USC Ming Hsieh Department of Electrical and Computing Engineering.  Specifically, this project directly led to the creation of four new graduate level course in deep learning and software skills for machine learning, the first undergraduate machine learning class in the department, and a significantly revised MS degree program in Machine Learning and Data Sciences.  

 


Last Modified: 12/08/2023
Modified by: Keith M Chugg

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page