Skip to feedback

Award Abstract # 1241849
EAGER: Formal Reliability Enhancement Methods for Million Core Computational Frameworks

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF UTAH
Initial Amendment Date: May 16, 2012
Latest Amendment Date: May 16, 2012
Award Number: 1241849
Award Instrument: Standard Grant
Program Manager: Nina Amla
namla@nsf.gov
 (703)292-7991
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: June 1, 2012
End Date: May 31, 2014 (Estimated)
Total Intended Award Amount: $200,000.00
Total Awarded Amount to Date: $200,000.00
Funds Obligated to Date: FY 2012 = $200,000.00
History of Investigator:
  • Ganesh Gopalakrishnan (Principal Investigator)
    ganesh@cs.utah.edu
  • Martin Berzins (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Utah
201 PRESIDENTS CIR
SALT LAKE CITY
UT  US  84112-9049
(801)581-6903
Sponsor Congressional District: 01
Primary Place of Performance: University of Utah
50 South Central Campus Dr
Salt Lake City
UT  US  84112-9205
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): LL8GLEVH6MG3
Parent UEI:
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7916, 7942, 7944, 9150
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

High Performance Computing is strategically important to national competitiveness. Advances in computational capabilities involve the use of unprecedented levels of parallelism: programming methods that involve billions of concurrent activities. Computational Frameworks allow these parallel programs to be organized in a modular fashion, achieving higher reliability, scalability, and better resource management capabilities.

The project develops formally based reliability enhancement mechanisms when Computational Frameworks are developed or optimized in response to the arrival of newer hardware/software technologies. These mechanisms include a formal specification of the expected behavior of these frameworks, and ways to verify them before deployment and during live operation. Correctness issues addressed in this proactive manner will considerably reduce the time it takes to proceed from idea to science. This project will enable a scientific understanding of which formal methods are likely to work at the scale of millions of cores, and which formal methods are best recommended to capture intended behavior versus those that are best suitable for run-time use. The project will also result in broad impact in terms of: the incorporation of our verification tools and techniques within popular tool-integration frameworks; achieving large-scale case studies on the use of formal methods within a computational framework that is under development; and training of undergraduate and graduate students on advanced correctness verification methods. It will also help build talent pool vital to continued progress in high performance computing with applications to science and engineering, energy/sustainability, and homeland security.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

High performance computing is of strategic importance to every nation,by providing the basic ability to conduct scientific research on anever-increasing scale. In order to accelerate the rate of discoveriesin science, medicine, and engineering, researchers are making use ofthe largest computers today and are banking on continued advances inour HPC capabilities.  The project investigated the use of rigoroustechniques to enhance an engineer's ability to debug these HPCsystems, and localize errors when they occur. This enables thescientist performing simulation based studies to quickly recover fromsoftware errors -- especially in a large-scale concurrent systeminvolving thousands of processors -- and continue along withscientific research.
INTELLECTUAL MERIT:
The main result of this research was the development of formal methodsthat exploit the structured software architecture of computationalframeworks as well as high level program control flow to localizebugs, and hence enhance debugging efforts. The project also lead tothe discovery of scalable solutions that are deployable in real HPCframeworks to enhance debugging. The project directly contributes toerror localization in future monitoring-based error detectionframeworks.
BROADER IMPACT:
The project contributed to tools for localizing errors in HPCframeworks, as well as released these tools open source. This willdirectly lead to enhanced amounts of science per invested HPC resourcedollar.  The project had a significant training component, helping totrain undergraduate students in concurrent programming, formalmethods, and high performance computing.  It also trained apost-doctoral researcher in research and development in highperformance computing software architectures and helping to advancethese infrastructures.
By enhancing this nation's HPC capabilities and helping maintain thesoftware that underlies it, the project directly impacts science andengineering practices, and directly addresses this nation's energy andhomeland security needs.

 

 


Last Modified: 06/06/2014
Modified by: Ganesh L Gopalakrishnan

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page