
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | June 23, 2010 |
Latest Amendment Date: | September 4, 2014 |
Award Number: | 1025159 |
Award Instrument: | Cooperative Agreement |
Program Manager: |
Edward Walker
edwalker@nsf.gov (703)292-4863 OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | July 1, 2010 |
End Date: | December 31, 2015 (Estimated) |
Total Intended Award Amount: | $7,763,246.00 |
Total Awarded Amount to Date: | $8,054,145.00 |
Funds Obligated to Date: |
FY 2011 = $6,228,419.00 FY 2012 = $14,000.00 FY 2013 = $14,000.00 FY 2014 = $248,599.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
520 LEE ENTRANCE STE 211 AMHERST NY US 14228-2577 (716)645-2634 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
520 LEE ENTRANCE STE 211 AMHERST NY US 14228-2577 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
CYBERINFRASTRUCTURE, XD-Extreme Digital |
Primary Program Source: |
01001112DB NSF RESEARCH & RELATED ACTIVIT 01001213DB NSF RESEARCH & RELATED ACTIVIT 01001314DB NSF RESEARCH & RELATED ACTIVIT 01001415DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
1025159
Furlani
This five year award is to provide a technology audit service for the eXtremeDigital (XD) program, the follow on to the successful NSF TeraGrid program. The technology audit service is designed to; continually test the user environment and capabilities provided by XD to ensure delivery of the highest possible quality of service, to provide internal quality assurance and quality control for XD, measuring quantitative and qualitative metrics of quality of service and to periodically report to the coordinating body for the XD. The award provides the objective metrics of XD quality of service that will be reviewed and revised by the XD in consultation with NSF as the technology evolves. The technology audit service will have user-level access to all XD computational, storage and visualization services and will use these for testing advanced software in partnership with the relevant service providers.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
High performance computing (HPC) systems, more commonly known as supercomputers, play a pivotal rule in society today, including the U.S. economy. They are essential tools in a diverse range of areas including finance, oil and gas exploration, pharmaceutical drug design, medical and basic research, computer animation, aeronautics, and automotive design to name a few. Today’s supercomputers are a complex combination of computer hardware (servers, network switches, storage) and software, and it is important that system support personnel have at their disposal tools to ensure that this complex infrastructure is running with optimal efficiency as well as the ability to proactively identify underperforming hardware and software. In addition, most HPC systems are overloaded, with many jobs queued waiting to run, and accordingly system support personnel desire the capability to monitor and analyze all end-user jobs to determine how efficiently they are running and what resources they are consuming (computer memory, processing, storage, networking, etc.) in order to optimize the number of computations run as well as plan for future needs.
Given the important role that high performance computers play in research and the economy, it is somewhat surprising that, prior to this project, no open source tools were available that provided for the comprehensive management of HPC systems. With this deficiency in mind, XDMoD was developed to provide a comprehensive management framework for the NSF’s high performance computing systems that are managed through the XSEDE program. In addition, the closely related Open XDMoD, an open source tool, provides similar functionality for HPC systems in general including government, industrial and academic HPC centers as well as Blue Waters – the largest supercomputer in the NSF portfolio. XDMoD for XSEDE and Open XDMoD were designed to meet the following objectives:
(1) provide the end-user community with a tool to optimize their use of HPC resources,
(2) provide operational staff with the ability to monitor, diagnose, and tune system performance as well as measure the performance of all applications running on the HPC systems they manage,
(3) provide software developers with the ability to easily obtain detailed analysis of application performance to aid in optimizing code performance,
(4) provide stakeholders with a diagnostic tool to facilitate HPC planning and analysis, and
(5) provide metrics to help measure return on investment.
XDMoD provides a rich set of features accessible through an intuitive graphical interface, which is tailored to the role of the user, from scientists and engineers running computations to HPC facility and funding agency managers. Metrics provided by XDMoD include comprehensive statistics on: number and type of computational jobs run, resources (computation, memory, disk, network, etc.) consumed, job wait and wall time, scientific impact, and quality of service. The web interface is intuitive, allowing one to chart various metrics and interactively drill down to access additional related information.
The XDMoD framework is also designed to help ensure that the HPC infrastructure is delivering a high quality of service to its end-users by continuously monitoring system performance and reliability through the deployment of a series of programs specifically designed to monitor overall system performance. System managers are therefore able to proactively monitor the HPC infrastructure as opposed to having to rely on users to report failures or underperforming hardware and software.
An important capability of XDMoD is centered around monitoring the performance of all user jobs running on a given HPC resour...
Please report errors in award information by writing to: awardsearch@nsf.gov.