
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | December 21, 2010 |
Latest Amendment Date: | June 18, 2015 |
Award Number: | 1055094 |
Award Instrument: | Continuing Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | January 1, 2011 |
End Date: | December 31, 2017 (Estimated) |
Total Intended Award Amount: | $401,654.00 |
Total Awarded Amount to Date: | $401,654.00 |
Funds Obligated to Date: |
FY 2012 = $72,570.00 FY 2013 = $75,877.00 FY 2014 = $79,367.00 FY 2015 = $104,404.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
660 S MILL AVENUE STE 204 TEMPE AZ US 85281-3670 (480)965-5479 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
660 S MILL AVENUE STE 204 TEMPE AZ US 85281-3670 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
01001213DB NSF RESEARCH & RELATED ACTIVIT 01001314DB NSF RESEARCH & RELATED ACTIVIT 01001415DB NSF RESEARCH & RELATED ACTIVIT 01001516DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
In a decade feature sizes of integrated circuits are expected to shrink from present day 45nm to 12nm, increasing soft error rates from once-per-year to once-per-day. The International Technology Roadmap for Semiconductors (ITRS) report recognizes reliability as one of the most important challenges for the next decade, and points out that soft errors are the primary threat. Soft errors are transient faults, caused mostly by cosmic radiations and can lead to incorrect results or total system failure. The impact of soft errors on terrestrial systems can be both dire and sweeping, with targets including financial systems, health-care databases, power-grid, and communication infrastructure. Although much work has been done towards protecting computing systems from soft errors, the need for even more power, performance and area-efficient schemes for protection against soft errors is undeniable.
This research builds upon existing hardware and microarchitectural schemes to provide even more power-efficient protection from soft errors, and will primarily be achieved by better application analysis. This research involves developing application analysis, and transforming the way application uses microarchitectural components to maximize protection from soft errors. Key components of this project are to: Develop analytical techniques to model vulnerability of i) L1 data cache, ii) register file, and iii) pipeline latches, and use the vulnerability estimates to drive compiler, microarchitectural, and hybrid techniques to provide power-efficient protection of the components. iv) Synergistically combine component-level protection to provide power-efficient, system-level protection. v) Develop techniques for dynamically trading off power and performance for reliability. vi) Develop schemes for power-efficient multi-core protection. Keeping computation reliable, and yet power, performance and cost efficient is crucial in
maintaining the pace of technological advancement, securing national interests, and ultimately in improving the quality of life. PI plans to make public release of RP2Explore: a compiler-microarchitecture toolkit for quantitative study of power, performance, area, reliability, and thermal trade-offs in programmable platforms based on GCC for easy adaptability and maximal impact.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The recently completed NSF CAREER project Compiler Techniques for Power-Efficient Protection Against Soft Errors has numerous and significant impact in terms of innovation, personnel training and community software development.
Several embedded and cyber-physical systems are mission and safety-critical, therefore, it is important that they work correctly even when there are errors in the computing hardware. Software techniques for error resilience are preferred since they are applicable on all past, present and existing processors, and can be applied flexibly, not only to the critical applications, but also to the critical parts of an application.
This project evaluated all the pre-2015 software techniques to protect applications from soft errors and found that they were able to provide only up to one 9 of resilience (i.e., can detect only 90% of faults) – which is not enough for most safety-critical applications. As a result, the general perception in the field was that software techniques are not effective, and hardware solutions are required for achieving high degree of resilience. After more than 20 conference papers, 12 journal articles, 6 MS thesis and 2 Ph.D. thesis, this project produced several software techniques and a large body of repeatable evidence to claim that software techniques can provide more than three 9s of resilience (i.e., 99.9%) against soft errors. This is more than what is required by the strictest automotive safety standards, ASIL-D.
This project has been transformative since it makes software protection approaches as a viable alternative for hardware protection. This work has been highly impactful, with several universities using the tool-chain -- gemV -- that this was developed as a part of this project. There are already more than 350 citations to the work that has been produced as a part of this project. The students who were working on this project are now leading the resilience research teams at ARM and Cadence.
Last Modified: 09/17/2019
Modified by: Aviral Shrivastava
Please report errors in award information by writing to: awardsearch@nsf.gov.