Award Abstract # 1055094
CAREER: Compiler Techniques for Power-Efficient Protection Against Soft Errors

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: ARIZONA STATE UNIVERSITY
Initial Amendment Date: December 21, 2010
Latest Amendment Date: June 18, 2015
Award Number: 1055094
Award Instrument: Continuing Grant
Program Manager: Almadena Chtchelkanova
achtchel@nsf.gov
 (703)292-7498
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: January 1, 2011
End Date: December 31, 2017 (Estimated)
Total Intended Award Amount: $401,654.00
Total Awarded Amount to Date: $401,654.00
Funds Obligated to Date: FY 2011 = $69,436.00
FY 2012 = $72,570.00

FY 2013 = $75,877.00

FY 2014 = $79,367.00

FY 2015 = $104,404.00
History of Investigator:
  • Aviral Shrivastava (Principal Investigator)
    aviral.shrivastava@asu.edu
Recipient Sponsored Research Office: Arizona State University
660 S MILL AVENUE STE 204
TEMPE
AZ  US  85281-3670
(480)965-5479
Sponsor Congressional District: 04
Primary Place of Performance: Arizona State University
660 S MILL AVENUE STE 204
TEMPE
AZ  US  85281-3670
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): NTLHJXM55KZ6
Parent UEI:
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01001112DB NSF RESEARCH & RELATED ACTIVIT
01001213DB NSF RESEARCH & RELATED ACTIVIT

01001314DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT

01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7329, 9218, HPCC, 1045
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

In a decade feature sizes of integrated circuits are expected to shrink from present day 45nm to 12nm, increasing soft error rates from once-per-year to once-per-day. The International Technology Roadmap for Semiconductors (ITRS) report recognizes reliability as one of the most important challenges for the next decade, and points out that soft errors are the primary threat. Soft errors are transient faults, caused mostly by cosmic radiations and can lead to incorrect results or total system failure. The impact of soft errors on terrestrial systems can be both dire and sweeping, with targets including financial systems, health-care databases, power-grid, and communication infrastructure. Although much work has been done towards protecting computing systems from soft errors, the need for even more power, performance and area-efficient schemes for protection against soft errors is undeniable.

This research builds upon existing hardware and microarchitectural schemes to provide even more power-efficient protection from soft errors, and will primarily be achieved by better application analysis. This research involves developing application analysis, and transforming the way application uses microarchitectural components to maximize protection from soft errors. Key components of this project are to: Develop analytical techniques to model vulnerability of i) L1 data cache, ii) register file, and iii) pipeline latches, and use the vulnerability estimates to drive compiler, microarchitectural, and hybrid techniques to provide power-efficient protection of the components. iv) Synergistically combine component-level protection to provide power-efficient, system-level protection. v) Develop techniques for dynamically trading off power and performance for reliability. vi) Develop schemes for power-efficient multi-core protection. Keeping computation reliable, and yet power, performance and cost efficient is crucial in
maintaining the pace of technological advancement, securing national interests, and ultimately in improving the quality of life. PI plans to make public release of RP2Explore: a compiler-microarchitecture toolkit for quantitative study of power, performance, area, reliability, and thermal trade-offs in programmable platforms based on GCC for easy adaptability and maximal impact.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 30)
Aviral Shrivastava, Abhishek Risheekesan, Reiley Jeyapaul and Carole-Jean Wu "Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors" In Proceedings of the 51st Annual Design Automation Conference (DAC), 2014 , 2014
Didehban, Moslem and Lokam, Dheeraj and Shrivastava, Aviral "An Integrated Safe and Fast Recovery Scheme from Soft Errors" Proceedings of The 54th Annual Design Automation Conference (DAC) , 2017
Didehban, Moslem and Shrivastava, Aviral "NZDC: A Compiler technique for near Zero Silent data Corruption" Proceedings of The 53rd Annual Design Automation Conference (DAC) , 2016
Hwisoo So, Moslem Didehban, Yohan Ko, Aviral Shrivastava and Kyoungwoo Lee "EXPERT: Effective and Flexible Error Protection by Redundant Multithreading" In Proceedings of the 21st International Conference on Design Automation and Test in Europe (DATE), 2018 , 2018
Jeyapaul, Reiley and Risheekesan, Abhishek and Shrivastava, Aviral and Lee, Kyoungwoo "UnSync-CMP: Multicore CMP Architecture for Energy Efficient Soft Error Reliability" Transactions on Parallel and Distributed Systems , v.25 , 2014 , p.254-263
Jeyapaul, Reiley and Shrivastava, Aviral "Enabling Energy Ef{\&}#64257;cient Reliability in Embedded Systems Through Smart Cache Cleaning" Transactions on Design Automation of Electronic Systems , v.18 , 2013 , p.53:1-53:2
Jongeun Lee and Aviral Shrivastava "Software-Based Register File Vulnerability Reduction for Embedded Processors" ACM Transactions on Embedded Computing Systems , v.13 , 2013
Jongeun Lee and Aviral Shrivastava "Static Analysis of Register File Vulnerability" IEEE TVLSI: IEEE Transactions on Very Large Scale Integrated circuits , v.30 , 2011
Jongeun Lee and Aviral Shrivastava. "Software-Based Register File Vulnerability Reduction for Embedded Processors." ACM Transactions on Embedded Computing Systems , v.13 , 2013 , p.38:1 - 38
Ko, Yohan and Jeyapaul, Reiley and Kim, Youngbin and Lee, Kyoungwoo and Shrivastava, Aviral "Protecting Caches from Soft Errors: A Microarchitect's Perspective" ACM Transactions on Embedded Computing Systems (TECS) , v.16 , 2017 , p.93:1
Lee, Jongeun and Shrivastava, Aviral "A Compiler-Microarchitecture Hybrid Approach to Soft Error Reduction for Register Files" IEEE TCAD: IEEE Transactions on Computer Aided Design , v.29 , 2010 , p.1018-1027
(Showing: 1 - 10 of 30)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The recently completed NSF CAREER project Compiler Techniques for Power-Efficient Protection Against Soft Errors has numerous and significant impact in terms of innovation, personnel training and community software development.

Several embedded and cyber-physical systems are mission and safety-critical, therefore, it is important that they work correctly even when there are errors in the computing hardware. Software techniques for error resilience are preferred since they are applicable on all past, present and existing processors, and can be applied flexibly, not only to the critical applications, but also to the critical parts of an application.

This project evaluated all the pre-2015 software techniques to protect applications from soft errors and found that they were able to provide only up to one 9 of resilience (i.e., can detect only 90% of faults) – which is not enough for most safety-critical applications. As a result, the general perception in the field was that software techniques are not effective, and hardware solutions are required for achieving high degree of resilience. After more than 20 conference papers, 12 journal articles, 6 MS thesis and 2 Ph.D. thesis, this project produced several software techniques and a large body of repeatable evidence to claim that software techniques can provide more than three 9s of resilience (i.e., 99.9%) against soft errors. This is more than what is required by the strictest automotive safety standards, ASIL-D.

This project has been transformative since it makes software protection approaches as a viable alternative for hardware protection. This work has been highly impactful, with several universities using the tool-chain -- gemV -- that this was developed as a part of this project. There are already more than 350 citations to the work that has been produced as a part of this project. The students who were working on this project are now leading the resilience research teams at ARM and Cadence.

 


Last Modified: 09/17/2019
Modified by: Aviral Shrivastava

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page