Award Abstract # 2226152
RI: Small: Taming Massive Pre-trained Models under Label Scarcity via an Optimization Lens

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: GEORGIA TECH RESEARCH CORP
Initial Amendment Date: August 25, 2022
Latest Amendment Date: August 25, 2022
Award Number: 2226152
Award Instrument: Standard Grant
Program Manager: Vladimir Pavlovic
vpavlovi@nsf.gov
 (703)292-8318
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2022
End Date: August 31, 2026 (Estimated)
Total Intended Award Amount: $539,926.00
Total Awarded Amount to Date: $539,926.00
Funds Obligated to Date: FY 2022 = $539,926.00
History of Investigator:
  • Tuo Zhao (Principal Investigator)
    tzhao80@gatech.edu
Recipient Sponsored Research Office: Georgia Tech Research Corporation
926 DALNEY ST NW
ATLANTA
GA  US  30318-6395
(404)894-4819
Sponsor Congressional District: 05
Primary Place of Performance: Georgia Tech Research Corporation
926 DALNEY ST NW
ATLANTA
GA  US  30332
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): EMW9FC8J3HN4
Parent UEI: EMW9FC8J3HN4
NSF Program(s): Robust Intelligence
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7495, 7923
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Deep transfer learning (DTL) has made significant progress in many real-world applications such as image and speech recognition. Training deep learning models in these applications often requires large amounts of labeled data, (e.g., images with annotated objects). Labelling these data by human labor, however, can be very expensive and time-consuming, which significantly limits the broader adoption of deep learning. Such an issue is more pronounced in certain domains (e.g. biomedical domain), where labeled data are scarce. To address the concern of label scarcity, researchers have resorted to deep transfer learning, where a massive deep learning model is first pre-trained only using unlabeled data and then adapted to the downstream task of our interests with only limited labelled data. Due to the gap between the enormous sizes of the pre-trained models and the limited labeled data, however, such a deep transfer learning approach is prone to overfitting and fail to generalize well on the unseen data, especially when there are noisy labels. Moreover, the enormous model sizes make practical deployment very difficult when there are constraints on storage/memory usage, inference latency and energy consumption, especially on edge devices. This project aims to develop an efficient computational framework to improve the generalization of deep transfer learning and reduce the model sizes by leveraging cutting-edge optimization and machine learning techniques.

Specifically, this project aims to develop: (I) new adversarial regularization methods, which can regularize the complexity of deep learning models and prevent overfitting of the training data, (II) new self-training methods robust to noisy labels in the training data, and (III) new optimization methods, which can improve the training of compact deep learning models in deep transfer learning. Moreover, we will develop new generalization and approximation theories for understanding the benefits of our proposed methods in transfer learning. The proposed research will also deliver open-source software in the form of easy-to-use libraries, which facilitate researchers and practitioners to apply DTL in related fields.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Li, Y and Yu, Y and Liang, C and He, P and Karampatziakis, N and Chen, W and Zhao, T "LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models" , 2024 Citation Details
Wang, Haoyu and Wang, Yaqing and Liu, Tianci and Zhao, Tuo and Gao, Jing "HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference" , 2023 https://doi.org/10.18653/v1/2023.findings-emnlp.283 Citation Details
Zhang, Qingru and Ram, Dhananjay and Hawkins, Cole and Zha, Sheng and Zhao, Tuo "Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer" , 2023 https://doi.org/10.18653/v1/2023.findings-emnlp.183 Citation Details
Zuo, S and Liu, X and Charles, D and Jiao, J and Manavoglu, E and Zhao, T and Gao, J "Efficient Hybrid Long Sequence Modeling with State Space Augmented Transformers" , 2024 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page