Award Abstract # 1212535
CSR: Small: Collaborative Research: FastStor: Data-Mining-Based Multilayer Prefetching for Hybrid Storage Systems

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: TEXAS STATE UNIVERSITY
Initial Amendment Date: February 9, 2012
Latest Amendment Date: June 10, 2014
Award Number: 1212535
Award Instrument: Continuing Grant
Program Manager: Marilyn McClure
mmcclure@nsf.gov
 (703)292-5197
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2011
End Date: November 30, 2014 (Estimated)
Total Intended Award Amount: $145,316.00
Total Awarded Amount to Date: $169,816.00
Funds Obligated to Date: FY 2009 = $112,816.00
FY 2010 = $16,000.00

FY 2011 = $16,500.00

FY 2013 = $8,000.00

FY 2014 = $16,500.00
History of Investigator:
  • Ziliang Zong (Principal Investigator)
    zz11@txstate.edu
Recipient Sponsored Research Office: Texas State University - San Marcos
601 UNIVERSITY DR
SAN MARCOS
TX  US  78666-4684
(512)245-2314
Sponsor Congressional District: 15
Primary Place of Performance: Texas State University - San Marcos
TX  US  78666-4684
Primary Place of Performance
Congressional District:
15
Unique Entity Identifier (UEI): HS5HWWK1AAU5
Parent UEI:
NSF Program(s): Special Projects - CNS,
CSR-Computer Systems Research,
EPSCoR Co-Funding
Primary Program Source: 01000910DB NSF RESEARCH & RELATED ACTIVIT
01001011DB NSF RESEARCH & RELATED ACTIVIT

01001112DB NSF RESEARCH & RELATED ACTIVIT

01001314DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7354, 7923, 9150, 9178, 9218, 9251, HPCC
Program Element Code(s): 171400, 735400, 915000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

CSR proposal #0917137

CSR:Small:Collaborative Research: FastStor: Data-Mining-Based
Multilayer Prefetching for Hybrid Storage Systems


Abstract

A large number of existing parallel storage systems consist of hybrid storage components, including solid-state drives (SSD), hard disks (HDD), and tapes. Compared with high-speed storage components (e.g. SSD and HDD), tapes inevitably become an I/O performance bottleneck. Prefetching and caching are commonly employed techniques to boost I/O performance by increasing the data hitting rate of high-end storage components. However, prefetching in the context of hybrid storage systems is technically challenging due to an interesting dilemma: aggressive prefetching schemes can efficiently reduce I/O latency, whereas overaggressive schemes may waste I/O bandwidth by transferring useless data from HDDs to SSDs or from tapes to HDDs. In this research project, called FastStor, we investigate new data-mining-based multilayer prefetching techniques to improve performance of hybrid storage systems. The goals of this research are to (1) design data-mining algorithms for multilayer prefetching; (2) develop predictive parallel prefetching mechanism for SSD-based storage systems; (3) implement parallel data transfer among SSDs, HDDs, and tapes; (4) develop meta-data management schemes; and (5) implement a simulation framework named FastStor-SIM. The developed toolkit can be used to improve the I/O performance of data centers with hybrid storage systems. The research findings of this project are published in conferences or journals for public knowledge. Through the collaboration of Auburn University, South Dakota School of Mines and Technology, and the University of Southern Mississippi, PIs promote learning and training by exposing graduate and undergraduate students to technological underpinnings in the fields of storage systems.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

G. D. Standart, K.R. Stulken, X. S. Zhang, and Z. L. Zong "Vis-EROS: Geospatial Visualization of Global Satellite Images Download Requests in Google Earth" ouranl of Environmental Modelling & Software (ENVSOFT) , v.26 , 2011 , p.980 10.1016/j.envsoft.2011.02.012
G. Standart, M. Penaloza and Z.L. Zong "Use of data mining techniques in the discovery of spatial and temporal earthquake relationship" the 2010 Midwest Instruction and Computing Symposium , 2010
M. Nijim, Z. L. Zong, K. Bellam, S. Yin, and X. Qin "Quality of Security Adaptation in Parallel Disk Systems" Journal of Parallel and Distributed Computing (JPDC) , v.71 , 2011 , p.288 10.1016/j.jpdc.2010.08.014
Nijim, M., Z.L. Zong, Xiao Qin, Nijim, Y. "Multi-layer Prefetching for Hybrid Storage Systems: Algorithms, Models, and Evaluations" IEEE International Conference on Parallel Processing Workshops (ICPPW) , 2010
S. Yin, M. I. Alghamdi, X. J. Ruan, M. Nijim, A. Tamilarasan, Z.L. Zong, X. Qin, and Y. M. Yang "Improving Energy Efficiency and Security for Disk Systems" The 12th International Conference on High Performance Computing and Communications (HPCC) , 2010
X. J. Ruan, A. Manzanares, S. Yin, Z. L. Zong, and X. Qin "Performance Evaluation of Energy-Efficient Parallel I/O Systems with Write Buffer Disks" the 38th International Conference on Parallel Processing (ICPP 2009) , 2009
Z. L. Zong, A. Manzanares, X. J. Ruan, and X. Qin "EAD and PEBD: Two Energy-Aware Duplication Scheduling Algorithms for Parallel Tasks on Homogeneous Clusters" IEEE Transactions on Computers , v.60 , 2011 , p.360 10.1109/TC.2010.216
Z. L. Zong, J. Job, X. S. Zhang, M. Nijim, and X. Qin "Case study of visualizing global user download patterns using Google Earth and NASA World Wind" Journal of Applied Remote Sensing , v.6 , 2012 10.1117/1.JRS.6.061703
Z. L. Zong, R. Fares, B. Romoser, and J. Wood "FastStor: Improving Performance of A Large Scale Hybrid Storage System via Caching and Prefetching" Journal of Cluster Computing , 2013 10.1007/s10586-013-0304-5
Z. L. Zong, X. Qin, X. J. Ruan, and M. Nijim "Heat-Based Dynamic Data Caching: A Load Balancing Strategy for Energy-Efficient Parallel Storage Systems with Buffer Disks" The 27th IEEE Symposium on Massive Storage Systems and Technologies (MSST 2011) , 2011

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

To achieve the tradeoff between performance and cost, many large scale storage systems consist of hybrid storage components, including solid-state drives (SSD), hard disks (HDD), and tapes. Compared with high-speed storage components (e.g. SSD and HDD), tapes are likely to become an I/O performance bottleneck. Prefetching and caching are efficient techniques to boost I/O performance by increasing the hit rate of high-end storage components. However, prefetching in the context of hybrid storage systems is technically challenging because prefetching can reduce I/O latency on one hand but can also waste I/O bandwidth and energy by pre-processing and transferring useless data. The primary goal of this project is to investigate innovative data-mining-based multilayer prefetching techniques to improve the performance without significantly increasing the energy and cost of hybrid storage systems. Three universities (Auburn University – the lead institution, Texas State University, and Texas A&M University at Kingsville) are involved in this collaborative project. Texas State University also works closely with the Earth Resources Observation and Science Center (EROS) of the U.S. Geological Survey (USGS). The research and education outcomes derived from the Texas State University grant are summarized below:

1) Research Activities: A number of research projects were conducted, including geo-visualization of massive satellite download requests provided by the USGS EROS, evaluation of conventional caching algorithms and existing data-mining-based prefetching algorithms on improving the performance of EROS hybrid storage systems, EROS user download pattern and behavior analysis, designing the popularity-oriented and user-specific prefetching algorithms and evaluating their impact on both performance and energy efficiency, developing the first SQL engine that can run SQL queries on the Intel Xeon Phi to accelerate data processing, characterizing the energy consumption of programs running on GPUs and Intel Xeon Phi. These projects have generated a number of novel algorithms and new studies, which contribute to the disciplines of hybrid storage systems, data visualization, data mining, high performance computing (HPC), green computing, and big data analytics.  

2) Publications: By the time of submitting this report, eleven peer-reviewed papers have been published in highly recognized journals and IEEE/ACM sponsored conferences/workshops, which include the Journal of Cluster Computing, Journal of Applied Remote Sensing, Journal of Environmental Modeling & Software, ACM/IEEE Supercomputing Conference (SC), International Conference on Parallel Processing (ICPP), International Conference on Big Data Science and Computing, IEEE International Conference on Networking, Architecture, and Storage, IEEE International Performance Computing and Communications Conference (IPCCC). In addition, a book chapter has been accepted by the book of Big Data Algorithms, Analytics, and Applications and is currently in press.

3) Training: This NSF project provided ample opportunities for both graduate and undergraduate students at Texas State University to conduct research in the field of high performance computing, data mining, big data analytics, and hybrid storage systems. Four graduate students and four undergraduate students participated in the aforementioned research projects led by the PI. Many of them made impressive achievements by authoring journal, conference or workshop papers, among which six papers are co-authored by graduate students and four papers are co-authored by undergraduate students.

4) Education: The research approaches and results have been introduced into both undergraduate and graduate level courses to benefit a large group of students at three institutions. Students from these classes a...

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page