Award Abstract # 1150273
CAREER: Dependable High Performance Scientific Computing at Extreme Scale via Algorithmic Fault Tolerance

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: TRUSTEES OF THE COLORADO SCHOOL OF MINES
Initial Amendment Date: March 29, 2012
Latest Amendment Date: March 29, 2012
Award Number: 1150273
Award Instrument: Standard Grant
Program Manager: Daniel Katz
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: April 1, 2012
End Date: December 31, 2012 (Estimated)
Total Intended Award Amount: $454,497.00
Total Awarded Amount to Date: $454,497.00
Funds Obligated to Date: FY 2012 = $0.00
History of Investigator:
  • Zizhong Chen (Principal Investigator)
    chen@cs.ucr.edu
Recipient Sponsored Research Office: Colorado School of Mines
1500 ILLINOIS ST
GOLDEN
CO  US  80401-1887
(303)273-3000
Sponsor Congressional District: 07
Primary Place of Performance: Colorado School of Mines
1500 Illinois Street
Golden
CO  US  80401-1887
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): JW2NGMP4NMA3
Parent UEI: JW2NGMP4NMA3
NSF Program(s): CAREER: FACULTY EARLY CAR DEV
Primary Program Source: 01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045
Program Element Code(s): 104500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Extreme scale high-end computing platforms are expected to be available before 2020 and will have 100 million to 1 billion CPU cores. Due to the large number of components in these platforms, the probability that errors occur during the execution of an extreme scale application is expected to be much higher than observed today. The goal of this CAREER research project is to develop highly efficient techniques to detect, locate, and correct both soft and hard errors according to the specific characteristics of an algorithm. The target algorithms include (1) Krylov subspace methods for solving sparse linear systems and eigenvalue problems; (2) Direct methods for solving dense linear systems and eigenvalue problems; and (3) Newton's method for solving systems of non-linear equations.

This project will create significant education outcomes by integrating the following four components: (1) establishing a supercomputing research laboratory to support senior design projects and REU, enhance graduate education and research, and demonstrate highly dependable applications on high-end computing platforms; (2) enriching the teaching of both undergraduate and graduate courses by integrating fault tolerance and high performance computing into the courses; (3) increasing minority students involvement by encouraging minority students to pursue graduate degrees in computing; and (4) offering free workshops to K-12 teachers and students.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page