
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | February 9, 2012 |
Latest Amendment Date: | June 10, 2014 |
Award Number: | 1212535 |
Award Instrument: | Continuing Grant |
Program Manager: |
Marilyn McClure
mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2011 |
End Date: | November 30, 2014 (Estimated) |
Total Intended Award Amount: | $145,316.00 |
Total Awarded Amount to Date: | $169,816.00 |
Funds Obligated to Date: |
FY 2010 = $16,000.00 FY 2011 = $16,500.00 FY 2013 = $8,000.00 FY 2014 = $16,500.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
601 UNIVERSITY DR SAN MARCOS TX US 78666-4684 (512)245-2314 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
TX US 78666-4684 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Special Projects - CNS, CSR-Computer Systems Research, EPSCoR Co-Funding |
Primary Program Source: |
01001011DB NSF RESEARCH & RELATED ACTIVIT 01001112DB NSF RESEARCH & RELATED ACTIVIT 01001314DB NSF RESEARCH & RELATED ACTIVIT 01001415DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
CSR proposal #0917137
CSR:Small:Collaborative Research: FastStor: Data-Mining-Based
Multilayer Prefetching for Hybrid Storage Systems
Abstract
A large number of existing parallel storage systems consist of hybrid storage components, including solid-state drives (SSD), hard disks (HDD), and tapes. Compared with high-speed storage components (e.g. SSD and HDD), tapes inevitably become an I/O performance bottleneck. Prefetching and caching are commonly employed techniques to boost I/O performance by increasing the data hitting rate of high-end storage components. However, prefetching in the context of hybrid storage systems is technically challenging due to an interesting dilemma: aggressive prefetching schemes can efficiently reduce I/O latency, whereas overaggressive schemes may waste I/O bandwidth by transferring useless data from HDDs to SSDs or from tapes to HDDs. In this research project, called FastStor, we investigate new data-mining-based multilayer prefetching techniques to improve performance of hybrid storage systems. The goals of this research are to (1) design data-mining algorithms for multilayer prefetching; (2) develop predictive parallel prefetching mechanism for SSD-based storage systems; (3) implement parallel data transfer among SSDs, HDDs, and tapes; (4) develop meta-data management schemes; and (5) implement a simulation framework named FastStor-SIM. The developed toolkit can be used to improve the I/O performance of data centers with hybrid storage systems. The research findings of this project are published in conferences or journals for public knowledge. Through the collaboration of Auburn University, South Dakota School of Mines and Technology, and the University of Southern Mississippi, PIs promote learning and training by exposing graduate and undergraduate students to technological underpinnings in the fields of storage systems.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
To achieve the tradeoff between performance and cost, many large scale storage systems consist of hybrid storage components, including solid-state drives (SSD), hard disks (HDD), and tapes. Compared with high-speed storage components (e.g. SSD and HDD), tapes are likely to become an I/O performance bottleneck. Prefetching and caching are efficient techniques to boost I/O performance by increasing the hit rate of high-end storage components. However, prefetching in the context of hybrid storage systems is technically challenging because prefetching can reduce I/O latency on one hand but can also waste I/O bandwidth and energy by pre-processing and transferring useless data. The primary goal of this project is to investigate innovative data-mining-based multilayer prefetching techniques to improve the performance without significantly increasing the energy and cost of hybrid storage systems. Three universities (Auburn University – the lead institution, Texas State University, and Texas A&M University at Kingsville) are involved in this collaborative project. Texas State University also works closely with the Earth Resources Observation and Science Center (EROS) of the U.S. Geological Survey (USGS). The research and education outcomes derived from the Texas State University grant are summarized below:
1) Research Activities: A number of research projects were conducted, including geo-visualization of massive satellite download requests provided by the USGS EROS, evaluation of conventional caching algorithms and existing data-mining-based prefetching algorithms on improving the performance of EROS hybrid storage systems, EROS user download pattern and behavior analysis, designing the popularity-oriented and user-specific prefetching algorithms and evaluating their impact on both performance and energy efficiency, developing the first SQL engine that can run SQL queries on the Intel Xeon Phi to accelerate data processing, characterizing the energy consumption of programs running on GPUs and Intel Xeon Phi. These projects have generated a number of novel algorithms and new studies, which contribute to the disciplines of hybrid storage systems, data visualization, data mining, high performance computing (HPC), green computing, and big data analytics.
2) Publications: By the time of submitting this report, eleven peer-reviewed papers have been published in highly recognized journals and IEEE/ACM sponsored conferences/workshops, which include the Journal of Cluster Computing, Journal of Applied Remote Sensing, Journal of Environmental Modeling & Software, ACM/IEEE Supercomputing Conference (SC), International Conference on Parallel Processing (ICPP), International Conference on Big Data Science and Computing, IEEE International Conference on Networking, Architecture, and Storage, IEEE International Performance Computing and Communications Conference (IPCCC). In addition, a book chapter has been accepted by the book of Big Data Algorithms, Analytics, and Applications and is currently in press.
3) Training: This NSF project provided ample opportunities for both graduate and undergraduate students at Texas State University to conduct research in the field of high performance computing, data mining, big data analytics, and hybrid storage systems. Four graduate students and four undergraduate students participated in the aforementioned research projects led by the PI. Many of them made impressive achievements by authoring journal, conference or workshop papers, among which six papers are co-authored by graduate students and four papers are co-authored by undergraduate students.
4) Education: The research approaches and results have been introduced into both undergraduate and graduate level courses to benefit a large group of students at three institutions. Students from these classes a...
Please report errors in award information by writing to: awardsearch@nsf.gov.