
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 7, 2017 |
Latest Amendment Date: | July 7, 2017 |
Award Number: | 1717532 |
Award Instrument: | Standard Grant |
Program Manager: |
Yuanyuan Yang
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2017 |
End Date: | July 31, 2021 (Estimated) |
Total Intended Award Amount: | $449,999.00 |
Total Awarded Amount to Date: | $449,999.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1314 S MOUNT VERNON AVE WILLIAMSBURG VA US 23185 (757)221-3965 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
VA US 23187-8795 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Graphics Processing Units (GPUs) are becoming the default choice for general-purpose hardware acceleration because of their ability to enable orders of magnitude faster and energy-efficient execution for large-scale high-performance computing applications. Since the majority of such applications executing on large-scale HPC systems are long-running, it is very important that they cope with a variety of hardware- and software-based faults. Many prior works have shown that real HPC systems are vulnerable to soft errors. An absence of essential protection and checkpointing mechanisms can lead to lower scientific productivity, operational efficiency, and even monetary loss. However, these protection mechanisms (e.g., error correction codes) are themselves not free -- they incur very high performance, energy, and area costs.
This project takes a holistic approach to explore the avenues to reduce these protection overheads by taking advantage of the fact that all errors do not lead to an unacceptable loss in the accuracy of application output. Prior results show that GPGPU applications are amenable to such accuracy-aware optimizations. In order to enable these optimizations, this project will address three major research questions: a) What hardware/software support and tools are necessary to determine which instructions are not vulnerable to soft errors, b) Based on this analysis, which hardware component(s) need not be protected and for how long, while not sacrificing application quality beyond the user's quality requirements, and c) What optimizations in terms of resource management and scheduling are necessary to make low-overhead but reliable computation more effective and efficient. These questions will be explored via a variety of GPGPU applications emerging from the areas of high-performance computing (HPC), big-data analytics, machine learning, and graphics. If successful, this project will generate several novel research insights that will play an important role in enabling low-cost reliable GPU computing. The results of this project will be integrated into the existing and new undergraduate and graduate courses on computer architecture and reliability, which will facilitate in training students, including women and students from diverse backgrounds and minority groups.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Last Modified: 10/01/2021
Modified by: Adwait Jog
Please report errors in award information by writing to: awardsearch@nsf.gov.