
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | August 2, 2016 |
Latest Amendment Date: | August 2, 2016 |
Award Number: | 1618912 |
Award Instrument: | Standard Grant |
Program Manager: |
Matt Mutka
CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | August 31, 2019 (Estimated) |
Total Intended Award Amount: | $199,671.00 |
Total Awarded Amount to Date: | $199,671.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1500 ILLINOIS ST GOLDEN CO US 80401-1887 (303)273-3000 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1610 Illinois Street Golden CO US 80401-1833 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | CSR-Computer Systems Research |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Heterogeneous computing is becoming crucial for many computational fields, including simulations of the galaxy, analysis of social networks, modeling of stock transactions, and so on. Programming heterogeneous memory systems is a grand challenge, and creates a major obstacle between heterogeneous hardware and applications because of the programming complexity and fast hardware evolution. This project aims to address this obstacle, and is expected to significantly relieve programmers from handling the underlying memory system heterogeneity. The outcome from this research will also enable continuous enhancement of the computing efficiency of a number of applications on future heterogeneous systems, which is a critical condition for sustained advancement of science, health, security and other aspects of humanity.
To address the programming challenges on heterogeneous memory systems, the project investigates a software framework, consisting of a hardware specification language, a set of novel compiler and runtime techniques, and advanced memory performance modeling. The goal is to develop a systematic solution to automatically place data given a complex heterogeneous memory system, especially on massively parallel platforms. With the proposed framework, programmers are relieved from tailoring their programs to different memory systems, and at the same time, the sophisticated memory systems can get fully translated into high computing efficiency. The framework transforms the programs such that they are customized - in terms of where data are placed in memory, when and how to migrate, etc.- to the underlying heterogeneous memory system at runtime and attain a near optimal memory usage.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Modern processors often leverage massive parallelism to provide extremely high computationthroughput, exemplified by Graphic Processing Units (GPUs), Many Integrated Cores (MIC), and Accelerate Processing Units (APUs). For the massive parallelism to yield performancebenefits, the memory system should provide high bandwidth, high capacity, and low latency.However, one type of memory can satisfy at most two of these requirements, motivating theemployment of heterogeneous memory systems (HMS). An HMS consists of multiple memorycomponents with different properties. For example, an NVIDIA GPU has more than eight typesof memories (global, texture, shared, constant, and various caches), with some on-chip,some off-chip, some directly manageable by software, and some not. It is thus challengingto place data in an optimal manner to maximize the achieved throughput.
The main technical objective of this project was to design a systematic approach tooptimizing data-to-memory mapping for HMS in massively parallel platforms. The projectaimed at dramatically improving the performance of important applications in multipleemerging domains, such as graph analytics and machine learning.
During the NSF project, we have produced research results to deepen the understanding of1) processing very large graphs which do not fit in the global memory of GPUs, 2) placingdata on fast and slow memory to optimize aggregate bandwidth, 3) partitioning and mappingcomputation to the complex memory hierarchy of GPUs for recurrent neural networks, and 4)partitioning data between the CPU and the GPU. Based on these understandings, we showedthat better data placement could lead to up to 10X and 7X performance improvements overother systems for graph processing and serving recurrent neural networks models,respectively.
We published seven papers and made two software repositories(https://github.com/zhangfengthu/FinePar and https://github.com/cmikeh2/grnn) publicthanks to the support of this award. Some papers appeared in top conferences, includingASPLOS, EuroSys, CGO, IPDPS, and PACT. Because of the high performance of the librarybased on our EuroSys paper, our collaborators at Microsoft are trying to integrate thecode in their production system for multiple applications, ranging from natural languageprocessing to text classification.
The PI has integrated some of the research outcomes in three courses he has offered multiple times at Colorado School of Mines. The project has supported five graduateresearch assistants, who gained research experiences in compilers, runtime systems, graphprocessing applications, and deep learning techniques. Some of them attended academicconferences and workshops, and consider to develop their career in academia.
Last Modified: 12/28/2019
Modified by: Bo Wu
Please report errors in award information by writing to: awardsearch@nsf.gov.