Award Abstract # 0834798
CSR-PSCE,SM: A Holistic Design Approach to Reliability Using 3D Stacked

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: UNIVERSITY OF SOUTHERN CALIFORNIA
Initial Amendment Date: August 13, 2008
Latest Amendment Date: July 7, 2009
Award Number: 0834798
Award Instrument: Standard Grant
Program Manager: Marilyn McClure
mmcclure@nsf.gov
 (703)292-5197
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2008
End Date: August 31, 2013 (Estimated)
Total Intended Award Amount: $402,904.00
Total Awarded Amount to Date: $418,904.00
Funds Obligated to Date: FY 2008 = $402,904.00
FY 2009 = $16,000.00
History of Investigator:
  • Murali Annavaram (Principal Investigator)
    annavara@usc.edu
Recipient Sponsored Research Office: University of Southern California
3720 S FLOWER ST FL 3
LOS ANGELES
CA  US  90033
(213)740-7762
Sponsor Congressional District: 34
Primary Place of Performance: University of Southern California
3720 S FLOWER ST FL 3
LOS ANGELES
CA  US  90033
Primary Place of Performance
Congressional District:
34
Unique Entity Identifier (UEI): G88KLJR3KYT5
Parent UEI:
NSF Program(s): CSR-Computer Systems Research
Primary Program Source: 01000809DB NSF RESEARCH & RELATED ACTIVIT
01000910DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7902, 9178, 9216, 9218, 9251, HPCC
Program Element Code(s): 735400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The future of information technology industry depends on designing computer systems that are tolerant of errors caused by variations in device characteristics. Traditionally system reliability is achieved by replicating critical system components. Since variability induced errors occur slowly over time, replication for the sole purpose of providing reliability is prohibitively expensive for low cost computing platforms. This research explores using 3D stacking to implement redundant components and variability monitoring circuitry on a 3D stacked die. Using 3D stacking the redundant computation blocks can be built using a variation resilient process technology that may be slower than the process technology used for building the primary processor. This research takes a holistic approach to designing the 3D stacked monitoring spanning from innovative microarchitecture solutions to exploiting application's inherent error tolerance. On the microarchitecture front, this research explores the potential for seamlessly reconfiguring the monitoring layer to act in three modes: performance assists, when variability induced errors are rare, or as guard processors, when variability induced errors begin to appear, or as backup processors, when device aging may result in irreparable errors on the primary processing substrate. On the architecture front, this research explores a new exception class called Reliability Aware Exceptions that allow microarchitecture blocks to raise an exception in response to a variability induced error. These software visible exceptions can then be exploited by application classes that are inherently error tolerant and can customized exception handling mechanisms.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bardia Zandian, Waleed Dweik, Suk Hun Kang, Thomas Punihaole, and Murali Annavaram. "Wearmon: Reliability monitoring using adaptive critical path testing." 2010 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , 2010 , p.151 10.1109/DSN.2010.5544916
B. Zandian and M. Annavaram "Cross-layer resilience using wearout aware design flow." 2011 IEEE/IFIP 41st International Conference on Dependable Systems Networks (DSN) , 2011 , p.151
Jinho Suh, Murali Annavaram and Michel Dubois "Soft Error Benchmarking for L2 Cache with PARMA" Sixth Annual Workshop on Modeling, Benchmarking and Simulation , 2010 , p.1

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page