
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 22, 2017 |
Latest Amendment Date: | July 22, 2017 |
Award Number: | 1725456 |
Award Instrument: | Standard Grant |
Program Manager: |
Danella Zhao
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2017 |
End Date: | August 31, 2022 (Estimated) |
Total Intended Award Amount: | $520,000.00 |
Total Awarded Amount to Date: | $520,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2200 W MAIN ST DURHAM NC US 27705-4640 (919)684-3030 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
Hudson Hall Durham NC US 27708-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | PPoSS-PP of Scalable Systems |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
In light of very recent revolutions of unsupervised learning algorithms (e.g., generative adversarial networks and dual-learning) and the emergence of their applications, three PIs/co-PI from Duke and UCSB form a team to design Ula! - an integrated DNN acceleration framework with enhanced unsupervised learning capability. The project revolutionizes the DNN research by introducing an integrated unsupervised learning computation framework with three vertically-integrated components from the aspects of software (algorithm), hardware (computing), and application (realization). The project echoes the call from the BRAIN Initiative (2013) and the Nanotechnology-Inspired Grand Challenge for Future Computing (2015) from the White House. The research outcomes will benefit both Computational Intelligence (CI) and Computer Architecture (CA) industries at large by introducing a synergy between computing paradigm and artificial intelligence (AI). The corresponding education components enhance existing curricula and pedagogy by introducing interdisciplinary modules on the software/hardware co-design for AI with creative teaching practices, and give special attentions to women and underrepresented minority groups.
The project performs three tasks: (1) At the software level, a generalized hierarchical decision-making (GHDM) system is designed to efficiently execute the state-of-the-art unsupervised learning and reinforcement learning processes with substantially reduced computation cost; (2) At the hardware level, a novel DNN computing paradigm is designed with enhanced unsupervised learning supports, based on the novelties in near data computing, GPU architecture, and FGPA + heterogeneous platforms; (3) At the application level, the usage of Ula! is exploited in scenarios that can greatly benefit from unsupervised learning and reinforcement learning. The developed techniques are also demonstrated and evaluated on three representative computing platforms: GPU, FPGA, and emerging nanoscale computing systems, respectively.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
We developed Ula! - an integrated Deep Neural Network (DNN) acceleration framework with advanced learning capability. Ula! consists of three vertically integrated components at the levels of algorithm, hardware, and application as: 1) Reinforcement/Supervised based enhanced unsupervised learning; 2) Novel DNN computing paradigm; 3) Technology applications and realizations.
Generative Adversarial Networks (GANs) have recently drawn tremendous attention as a powerful tool to improve the performance of unsupervised or semi-supervised DNN models. While GANs deliver state-of-the-art performance on these AI tasks, it comes at the cost of high computational complexity. We proposed ReGAN - a novel ReRAM-based Process-In-Memory (PIM) accelerator that can efficiently reduce off-chip memory accesses. Two techniques, namely, Spatial Parallelism and Computation Sharing are particularly proposed to further enhance training efficiency of GANs.
Model compression is significant for the wide adoption of Recurrent Neural Networks (RNNs) in both user devices possessing limited resources and business clusters requiring quick responses to large-scale service requests. To alleviate the computational cost of RNN, we proposed to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs. In the training process, removing a component of Intrinsic Sparse Structures (ISS) in the LSTMs simultaneously decreases the sizes of all basic structures by one and thereby always maintain the dimension consistency.
Process-in-memory (PIM) architecture such as Hybrid Memory Cube (HMC) has been used to improve the data locality for efficient DNN executions. However, it is still hard to efficiently deploy largescale matrix computation in DNN on HMC because of its coarse-grained packet protocol. We proposed NeuralHMC, the first HMCbased accelerator tailored for efficient DNN executions.
Existing nonvolatile memory-based machine learning accelerators could not support the computational needs required by GAN training. Specifically, the generator utilizes a new operator, called transposed convolution, which introduces significant resource underutilization when executed on conventional neural network accelerators as it inserts massive zeros in its input before a convolution operation. We proposed ZARA - A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM.
Stochastic Gradient Descent (SGD) is a popular training method of DNNs because of its efficiency. However, large-batch SGD tends to converge to sharp minima in the DNNs. We propose the SmoothOut framework to smooth out sharp minima in DNNs and thereby improve generalization of the DNNs. In particular, SmoothOut perturbs multiple copies of the DNN by noise injection and averages these copies. Our experimental results showed that SmoothOut improves generalization in both small-batch and large-batch training on the top of state-of-the-art solutions.
Previous works on Convolutional Neural Network (CNN) acceleration utilize low-rank approximation of the original convolution layers to reduce computation cost. However, these methods are very difficult to conduct upon sparse models, which limits execution speedup since redundancies within the CNN model are not fully exploited. We argue that kernel granularity decomposition can be conducted with low-rank assumption while exploiting the redundancy within the remaining compact coefficients. Based on this observation, we propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously.
Depth is a key component of DNNs. However, designing depth is heuristic and requires many human efforts. We propose AutoGrow to automate depth discovery in DNNs: starting from a shallow seed architecture, AutoGrow grows new layers if the growth improves the accuracy; otherwise, stops growing and thus discovers the depth. Our experiments showed that by applying the same policy to different network architectures, AutoGrow can always discover near-optimal depth on various datasets.
In modern GPUs, on-chip memory capacity keeps increasing to support thousands of chip-resident threads. The on-chip memory capacity of GPUs, however, is highly constrained by the large memory cell area and high static power consumption of conventional SRAM implementation. We propose to utilize the emerging multi-level cell (MLC) spin-transfer torque RAM (STT-RAM) technology to implement register files and shared memory in GPUs.
Federated learning (FL) is a popular distributed learning framework that trains a global model through iterative communications between a central server and edge devices. We empirically show that under extremely strong poisoning attacks, the existing defensive methods fail to guarantee the robustness of FL. More importantly, we observed that as long as the global model is polluted, the impact of attacks on the global model will remain in subsequent rounds even if there are no subsequent attacks. We proposed a client-based defense, named White Blood Cell for Federated Learning (FL-WBC), which can mitigate model poisoning attacks that have already polluted the global model.
This 5-year research project supported 7 graduate students and publications of 20 conference papers and 6 journal papers in total.
Last Modified: 11/23/2022
Modified by: Yiran Chen
Please report errors in award information by writing to: awardsearch@nsf.gov.