
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | January 14, 2008 |
Latest Amendment Date: | April 12, 2012 |
Award Number: | 0746832 |
Award Instrument: | Continuing Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2008 |
End Date: | July 31, 2014 (Estimated) |
Total Intended Award Amount: | $400,000.00 |
Total Awarded Amount to Date: | $476,000.00 |
Funds Obligated to Date: |
FY 2009 = $85,725.00 FY 2010 = $88,867.00 FY 2011 = $92,197.00 FY 2012 = $84,969.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
300 TURNER ST NW BLACKSBURG VA US 24060-3359 (540)231-5281 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
300 TURNER ST NW BLACKSBURG VA US 24060-3359 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
ADVANCED COMP RESEARCH PROGRAM, COMPUTING PROCESSES & ARTIFACT, Software & Hardware Foundation, HIGH-PERFORMANCE COMPUTING |
Primary Program Source: |
01000910DB NSF RESEARCH & RELATED ACTIVIT 01001011DB NSF RESEARCH & RELATED ACTIVIT 01001112DB NSF RESEARCH & RELATED ACTIVIT 01001213DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Modern scientific applications, such as analyzing information from large-scale distributed sensors, climate monitoring, and forecasting environmental impacts, require powerful computing resources and entail managing an ever-growing amount of data. While high-end computer architectures comprising of tens-of-thousands or more processors are becoming a norm in modern High Performance Computing (HPC) systems supporting such applications, this growth in computational power has not been matched by a corresponding improvement in storage and I/O systems. Consequently, there is an increasing gap between storage system performance and computational power of clusters, which poses critical challenges, especially in supporting emerging petascale scientific applications. This research develops a framework for bridging the said performance gap and supporting efficient and reliable data management for HPC. Through innovation, design, development, and deployment of the framework, the investigators improve the I/O performance of modern HPC setups.
The target HPC environments present unique research challenges, namely, maintaining I/O performance with increasing storage capacity, low-cost administration of a large number of resources, high-volume long-distance data transfers, and adapting to the varying I/O demands of applications. This research addresses these challenges in storage management by employing a Scalable Hierarchical Framework for HPC data storage. The framework provides high-performance reliable storage within HPC cluster sites via hierarchical organization of storage resources, decentralized interactions between sites to support high-speed, high-volume data exchange and strategic data placement, and system-wide I/O optimizations. The overall goal is a data storage framework attuned to the needs of modern HPC applications, which mitigates the underlying performance gap between compute resources and the I/O system. This research adopts a holistic approach where all system components interact to yield an efficient data management system for HPC.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
High performance computing (HPC) systems are faced with a deluge from the vast amounts of data that is being processed by the state-of-the-art and emerging petascale scientific computing applications. The goal of this project is to address the storage and I/O challenges arising from such data-intensive operations. We have developed a framework to bridge the performance gap between storage and compute components and support efficient and reliable data management for HPC. We adopted a two-pronged approach: providing high-performance reliable storage within HPC cluster sites via hierarchical organization of distributed storage resources, and enabling decentralized interactions between sites to support high-speed, high-volume data exchange.
A key contribution of the project is the design and development of tools to optimize the large-volume data transfers in HPC workflows. First, we developed a contributory storage based solution, which enabled HPC centers to offload data to user-provided distributed storage sites. We also developed cloud-enabled techniques for seamless data transfer between HPC centers and users, and offloading data-intensive workloads from the HPC centers to the cloud. Our off-loading approaches exploit the orthogonal bandwidth available between the users and the HPC center and relieve the center from handling I/O-intensive tasks, thus allowing the center to focus on compute-intensive components for which it is better provisioned. Evaluation of our approach using both real deployments as well as simulations demonstrates the feasibility of decentralized offloading; an improvement in the data transfer times by as much as 81.1% for typical HPC workloads was observed.
Second, we explored the use of solid-state storage devices (SSDs) in designing a novel multi-tiered data staging area that can that then be seamlessly integrated with our offloading system, with the traditional HPC storage stack (e.g., Lustre) as the secondary storage. The novelty of our approach is that we employed SSDs in a limited number of participants that are expected to observe the peak load, thus ensuring economic feasibility. Our evaluation showed that the staging area absorbs application checkpoint data and seamlessly drains the data from various storage tiers to the parallel file system, thereby improving the overall I/O performance. We also extended the work to use adaptive data placement, both across various storage layers of an HPC site and with individual nodes within a site. The evaluation yielded better understanding of using the storage layers, and insights into how to incorporate SSDs into the storage hierarchy.
Finally, we explored the use of emerging technologies such as accelerators and low-power micro-servers in supporting the HPC I/O stack operations. Specifically, we explored the use of such components in supporting I/O-intensive workloads both for HPC applications as well as the extant cloud programming model, Hadoop. To this end, we designed low-cost GPUs to achieve a flexible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system. We capitalize the resources provided by the file system, such as striping individual files over multiple disks, with the computational power of a GPU to provide flexible and fast parity computation for encoding and rebuilding of degraded RAID arrays. The results demonstrate that leveraging GPUs for I/O support functions, i.e., RAID parity computation, is a feasible approach and can provide an efficient alternative to specialized-hardware-based solutions. The effect would be to reduce the cost of HPC I/O systems, and improve the overall efficiency of the system.
The work on designing a robus...
Please report errors in award information by writing to: awardsearch@nsf.gov.