Award Abstract # 1617967
CSR: Small: Collaborative Research: Exploring Portable Data Placement on Massively Parallel Platforms with Heterogeneous Memory Architectures

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: UNIVERSITY OF CALIFORNIA, MERCED
Initial Amendment Date: August 2, 2016
Latest Amendment Date: August 2, 2016
Award Number: 1617967
Award Instrument: Standard Grant
Program Manager: Matt Mutka
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2016
End Date: February 28, 2021 (Estimated)
Total Intended Award Amount: $300,000.00
Total Awarded Amount to Date: $300,000.00
Funds Obligated to Date: FY 2016 = $300,000.00
History of Investigator:
  • Dong Li (Principal Investigator)
    dli35@ucmerced.edu
Recipient Sponsored Research Office: University of California - Merced
5200 N LAKE RD
MERCED
CA  US  95343-5001
(209)201-2039
Sponsor Congressional District: 13
Primary Place of Performance: University of California, Merced
CA  US  95343-5001
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): FFM7VPAG8P92
Parent UEI:
NSF Program(s): CSR-Computer Systems Research
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923
Program Element Code(s): 735400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Heterogeneous computing is becoming crucial for many computational fields, including simulations of the galaxy, analysis of social networks, modeling of stock transactions, and so on. Programming heterogeneous memory systems is a grand challenge, and creates a major obstacle between heterogeneous hardware and applications because of the programming complexity and fast hardware evolution. This project aims to address this obstacle, and is expected to significantly relieve programmers from handling the underlying memory system heterogeneity. The outcome from this research will also enable continuous enhancement of the computing efficiency of a number of applications on future heterogeneous systems, which is a critical condition for sustained advancement of science, health, security and other aspects of humanity.

To address the programming challenges on heterogeneous memory systems, the project investigates a software framework, consisting of a hardware specification language, a set of novel compiler and runtime techniques, and advanced memory performance modeling. The goal is to develop a systematic solution to automatically place data given a complex heterogeneous memory system, especially on massively parallel platforms. With the proposed framework, programmers are relieved from tailoring their programs to different memory systems, and at the same time, the sophisticated memory systems can get fully translated into high computing efficiency. The framework transforms the programs such that they are customized - in terms of where data are placed in memory, when and how to migrate, etc. - to the underlying heterogeneous memory system at runtime and attain a near optimal memory usage.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 14)
Bang Di, Jianhua Sun, Dong Li, Hao Chen, and Zhe Quan "GMOD: A Dynamic GPU Memory Overflow Detector" International Conference on Parallel Architectures and Compilation Techniques , 2018
Ivy PengKai WuJie RenDong LiMaya Gokhale "Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems" 34th IEEE International Parallel and Distributed Processing Symposium , 2020
Jiawen Liu, Dong Li, Gokcen Kestor, and Jeffrey Vetter "Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training" IEEE International Parallel and Distributed Processing Symposium , 2019
Jiawen Liu, Hengyu Zhao, Matheus Ogleari, Dong Li, and Jishen Zhao "Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach" IEEE/ACM International Symposium on Microarchitecture , 2018
Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li and Jiajia Li "Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory" 26th Principles and Practice of Parallel Programming , 2021
Jiawen Liu, Zhen Xie, Dimitrios Nikolopoulos and Dong Li "Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices" USENIX Conference on Operational Machine Learning , 2020
Jie Ren, Chunhua Liao, and Dong Li "Opera: Data Access Pattern Similarity Analysis To Optimize OpenMP Task Affinity" International Workshop on High-Level Parallel Programming Models and Supportive Environments , 2019
Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon and Dong Li "Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning." 27th IEEE International Symposium on HighPerformance Computer Architecture , 2021
Jie Ren, Kai Wu, and Dong Li "Understanding Application Recomputability without Crash Consistency in Non-Volatile Memory" Workshop on Memory Centric Programming for HPC , 2018
Kai Wu, Jie Ren, and Dong Li "Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Memory for Task Parallel Program" ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , 2018
Kai Wu, Wenqian Dong, Qiang Guan, Nathan DeBardeleben, and Dong Li "Modeling Application Resilience in Large Scale Parallel Execution" International Conference on Parallel Processing , 2018
(Showing: 1 - 10 of 14)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Memory heterogeneity means multiple memory components with different properties (such as memory bandwidth, latency, capacity, and computing ability) form a memory system. Memory heterogeneity is becoming common because of the needs of increasing memory capacity or providing higher performance in a cost-effective way. Memory heterogeneity raises challenges on deciding the optimal placement of data objects on heterogeneous memory (HM). Recent studies indicate substantial difficulty of matching applications with HM because of the complex and fast changing nature of HM as well as application input sensitivity and phase behaviors.

 

Intellectual merit. We study system-level solutions to make best usage of HM for high performance. Our solutions are essentially based on the ideas of introducing limited application semantics information to direct data migration and allocation. Using application semantics, we are able to break fundamental tradeoff between memory profiling overhead and accuracy, and decide when to trigger data migration to maximize the overlap between data migration and computation to minimize data migration overhead.

 

Furthermore, we study application-level solutions to make best usage of HM for high performance.  We propose new data structures and algorithms to reduce expensive accesses to slow memory as much as possible. Those solutions are application-specific, but bring much higher performance than system-level solutions; Those solutions focus on critical and common applications, which justifies highly customization of the solutions for those applications. Both the system-level solutions and application-level solutions investigate principles on how a large amount of memory pages should be profiled to capture spatial and temporal localities without paying large overhead and how page migration should happen to fully utilize fast memory.

 

Broader impact. This project enables applications to fully tap the large memory capacity provided by HM. Some of those applications are critical to the nation interests (such as the DOE application WarpX, a large-scale plasma simulation code); Some of them are critical to the business (such as deep learning training and fast information retrieval). With our solutions, those applications are able to run in unprecedented scales on a single machine, even performing better than on multiple machines. This project has been highlighted by several medias and companies (e.g., towardsdatascience.com, Microsoft, and linkreseacher.com). This project lays foundation for many HPC applications (including compute-intensive applications with small memory) to leverage HM with large memory capacity. This project is among the first efforts that reveal using limited application semantics can be significantly helpful to improve application performance on HM.

 

Furthermore, this project provides research opportunities to undergraduate students to gain hands-on experiences on software-hardware co-designs. This project is also based on collaboration with Lawrence Berkeley National Lab and Lawrence Livermore National Lab. The project has impacts on how the future supercomputer infrastructure should be built. Collaborating with the national labs, we provide training opportunities to graduate students and prepare them for future career in the HPC field. Since the HPC field, which is critical to the national interests, lacks workforce. Our project is helpful to address this pressing problem.


Last Modified: 04/13/2021
Modified by: Dong Li

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page