Skip to feedback

Award Abstract # 1948447
CRII: OAC: An Efficient Lossy Compression Framework for Reducing Memory Footprint for Extreme-Scale Deep Learning on GPU-Based HPC Systems

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF ALABAMA
Initial Amendment Date: April 24, 2020
Latest Amendment Date: April 24, 2020
Award Number: 1948447
Award Instrument: Standard Grant
Program Manager: Alan Sussman
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 1, 2020
End Date: June 30, 2020 (Estimated)
Total Intended Award Amount: $174,593.00
Total Awarded Amount to Date: $174,593.00
Funds Obligated to Date: FY 2020 = $0.00
History of Investigator:
  • Dingwen Tao (Principal Investigator)
    ditao@iu.edu
Recipient Sponsored Research Office: University of Alabama Tuscaloosa
801 UNIVERSITY BLVD
TUSCALOOSA
AL  US  35401-2029
(205)348-5152
Sponsor Congressional District: 07
Primary Place of Performance: University of Alabama Tuscaloosa
801 University Blvd.
Tuscaloosa
AL  US  35478-0104
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): RCNJEHZ83EV6
Parent UEI: TWJWHYEM8T63
NSF Program(s): CRII CISE Research Initiation
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 026Z, 8228
Program Element Code(s): 026Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Deep learning (DL) has rapidly evolved to a state-of-the-art technique in many science and technology disciplines, such as scientific exploration, national security, smart environment, and healthcare. Many of these DL applications require using high-performance computing (HPC) resources to process large amounts of data. Researchers and scientists, for instance, are employing extreme-scale DL applications in HPC infrastructures to classify extreme weather patterns and high-energy particles. In recent years, using Graphics Processing Units (GPUs) to accelerate DL applications has attracted increasing attention. However, the ever-increasing scales of DL applications bring many challenges to today?s GPU-based HPC infrastructures. The key challenge is the huge gap (e.g., one to two orders of magnitude) between the memory requirement and its availability on GPUs. This project aims to fill this gap by developing a novel framework to reduce the memory demand effectively and efficiently via data compression technologies for extreme-scale DL applications. The proposed research will enhance the GPU-based HPC infrastructures in broad communities for many scientific disciplines that rely on DL technologies. The project will connect machine learning and HPC communities and increase interactions between them. Educational and engagement activities include developing new curriculum related to data compression, mentoring a selected group of high school students in a year-long research project for a regional Science Fair competition, and increasing the community's understanding of leveraging HPC infrastructures for DL technologies. The project will also encourage student interest in research related to DL technologies on HPC environment and promote research collaborations with multiple national laboratories.

Existing state-of-the-art GPU memory saving methods for training extreme-scale deep neural networks (DNNs) suffer from high performance overhead and/or low memory footprint reduction. Error-bounded lossy compression is a promising approach to significantly reduce the memory footprint while still meeting the required analysis accuracy. This project will explore how to leverage error-bounded lossy compression on DNN intermediate data to reduce the memory footprint for extreme-scale DNN training. The project has a three-stage research plan. First, the team will comprehensively investigate the impacts of applying error-bounded lossy compression to DNN intermediate data on both validation accuracy and training performance, using different error-bounded lossy compressors, compression modes, and error bounds on the targeted DNNs and datasets. Second, the team will optimize the compression quality of suitable error-bounded lossy compressors on different intermediate data based on the impact analysis outcome, and design an efficient scheme to adaptively apply a best-fit compression solution. Finally, the team will optimize the compression performance on the proposed lossy compression framework for state-of-the-art GPUs. The team will evaluate the proposed framework on high-resolution climate analytics and high-energy particle physics applications and compare it with existing state-of-the-art techniques based on both the memory footprint reduction ratio and training performance improvements (e.g., throughput, time, epoch number). The project will enable scientists and researchers to train extreme-scale DNNs with a given set of computing resources in a fast and efficient manner, opening opportunities for new discoveries.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 15)
Chen, Xinyu and Minutoli, Marco and Tian, Jiannan and Halappanavar, Mahantesh and Kalyanaraman, Ananth and Tao, Dingwen "HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures" The 31st International Conference on Parallel Architectures and Compilation Techniques (PACT 2022) , 2022 Citation Details
Dong, Peiyan and Wang, Siyue and Niu, Wei and Zhang, Chengming and Lin, Sheng and Li, Zhengang and Gong, Yifan and Ren, Bin and Lin, Xue and Tao, Dingwen "RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition" The 57th Annual Design Automation Conference (DAC 2020) , 2020 https://doi.org/10.1109/DAC18072.2020.9218499 Citation Details
Hu, Zhenbo and Zou, Xiangyu and Xia, Wen and Jin, Sian and Tao, Dingwen and Liu, Yang and Zhang, Weizhe and Zhang, Zheng "Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity" The 49th International Conference on Parallel Processing (ICPP 2020) , 2020 https://doi.org/10.1145/3404397.3404408 Citation Details
Jin, Sian and Grosset, Pascal and Biwer, Christopher and Pulido, Jesus and Tian, Jiannan and Tao, Dingwen and Ahrens, James "Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations" The 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2020) , 2020 https://doi.org/10.1109/IPDPS47924.2020.00021 Citation Details
Jin, Sian and Li, Guanpeng and Song, Shuaiwen Leon and Tao, Dingwen "A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression" The 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2021) , 2021 https://doi.org/10.1145/3437801.3441597 Citation Details
Jin, Sian and Zhang, Chengming and Jiang, Xintong and Feng, Yunhe and Guan, Hui and Li, Guanpeng and Song, Shuaiwen Leon and Tao, Dingwen "COMET: a novel memory-efficient deep learning training framework by using error-bounded lossy compression" Proceedings of the VLDB Endowment , v.15 , 2021 https://doi.org/10.14778/3503585.3503597 Citation Details
Rivera, Cody and Chen, Jieyang and Xiong, Nan and Zhang, Jing and Song, Shuaiwen Leon and Tao, Dingwen "TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs" Journal of Parallel and Distributed Computing , 2021 https://doi.org/10.1016/j.jpdc.2021.02.013 Citation Details
Rivera, Cody and Di, Sheng and Tian, Jiannan and Yu, Xiaodong and Tao, Dingwen and Cappello, Franck "Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs" The 36th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2022) , 2022 https://doi.org/10.1109/IPDPS53621.2022.00075 Citation Details
Tian, Jiannan and Di, Sheng and Yu, Xiaodong and Rivera, Cody and Zhao, Kai and Jin, Sian and Feng, Yunhe and Liang, Xin and Tao, Dingwen and Cappello, Franck "Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs" 2021 IEEE International Conference on Cluster Computing (CLUSTER 2021) , 2021 https://doi.org/10.1109/Cluster48925.2021.00047 Citation Details
Tian, Jiannan and Di, Sheng and Zhao, Kai and Rivera, Cody and Hickman Fulp, Megan and Underwood, Robert and Jin, Sian and Liang, Xin and Calhoun, Jon and Tao, Dingwen and Cappello, Franck "cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data" The 29th International Conference on Parallel Architectures and Compilation Techniques (PACT 2020) , 2020 https://doi.org/10.1145/3410463.3414624 Citation Details
Tian, Jiannan and Rivera, Cody and Di, Sheng and Chen, Jieyang and Liang, Xin and Tao, Dingwen and Cappello, Franck "Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures" The 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2021) , 2021 https://doi.org/10.1109/IPDPS49936.2021.00097 Citation Details
(Showing: 1 - 10 of 15)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page