Award Abstract # 0943705
STCI: Middleware for Monitoring and Troubleshooting of Large-Scale Applications on National Cyberinfrastructure

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF SOUTHERN CALIFORNIA
Initial Amendment Date: August 19, 2009
Latest Amendment Date: August 19, 2009
Award Number: 0943705
Award Instrument: Standard Grant
Program Manager: Kevin Thompson
kthompso@nsf.gov
 (703)292-4220
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2009
End Date: August 31, 2013 (Estimated)
Total Intended Award Amount: $1,875,831.00
Total Awarded Amount to Date: $1,875,831.00
Funds Obligated to Date: FY 2009 = $1,875,831.00
ARRA Amount: $1,875,831.00
History of Investigator:
  • Ewa Deelman (Principal Investigator)
    deelman@isi.edu
  • Christopher Brooks (Co-Principal Investigator)
  • Douglas Swany (Co-Principal Investigator)
  • Daniel Gunter (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Southern California
3720 S FLOWER ST FL 3
LOS ANGELES
CA  US  90033
(213)740-7762
Sponsor Congressional District: 34
Primary Place of Performance: University of Southern California
3720 S FLOWER ST FL 3
LOS ANGELES
CA  US  90033
Primary Place of Performance
Congressional District:
34
Unique Entity Identifier (UEI): G88KLJR3KYT5
Parent UEI:
NSF Program(s): CESER-Cyberinfrastructure for
Primary Program Source: 01R00910DB RRA RECOVERY ACT
Program Reference Code(s): 6890, 7684, 9215, 9216, HPCC
Program Element Code(s): 768400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This proposal will be awarded using funds made available by the American Recovery and Reinvestment Act of 2009 (Public Law 111-5), and meets the requirements established in Section 2 of the White House Memorandum entitled, Ensuring Responsible Spending of Recovery Act Funds, dated March 20, 2009.

The STCI: Middleware for Monitoring and Troubleshooting of Large-Scale Applications on National Cyberinfrastructure project aims to provide robust and scalable workflow monitoring services that can be used to track the progress of workflow-based applications as they are executing on the distributed cyberinfrastructure. New anomaly detection and troubleshooting services will also be developed to alert users to problems with the application and cyberinfrastructure services and allow them to quickly navigate and mine the application's execution records. The foundation of this work is the development of a robust and scalable infrastructure for performance information gathering and distribution. Information flowing through this infrastructure will be stored in high-performance archives and distributed to interested entities through subscription interfaces. Three main services will be developed: 1) an online monitoring service, 2) an anomaly detection service based on dynamic mining of application and cyberinfrastructure logs and 3) a troubleshooting service that will help trace the source of a failure.

Intellectual Merit
This work will potentially increase scientists' productivity by allowing them to quickly identify problems in an application, thus reducing the time it takes to generate scientifically meaningful results. This work will also make the performance of complex scientific workflows more transparent, which will enable the generation of accurate estimates of overall time to completion, more efficient use of resources, and easier resolution of end-to-end performance problems in collaboration with network and resource providers.

Broader Impact
Scientific communities in astronomy, biology, earthquake science, physics, and others will immediately benefit from the proposed system. Because the approach relies on simple, well-defined logging formats, this work is applicable to a range of workflow management systems as well as sub-components of those systems such as job managers and data transfer tools.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Karan Vahi, Ian Harvey, Taghrid Samak, Daniel Gunter, Kieran Evans, David Rogers, Ian Taylor, Monte Goode, Fabio Silva, Eddie Al-Shakarchi, Gaurang Mehta, Ewa Deelman, Andrew Jones "A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows," Journal of Grid Computing , v.11 , 2013 , p.381 10.1007/s10723-013-9265-4

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page