Award Abstract # 1813485
CSR: Small: Decoupling File System from Volatile Main Memory: A First Step towards a Single-Level Persistent Store

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: SAN DIEGO STATE UNIVERSITY FOUNDATION
Initial Amendment Date: June 29, 2018
Latest Amendment Date: June 29, 2018
Award Number: 1813485
Award Instrument: Standard Grant
Program Manager: Daniela Oliveira
doliveir@nsf.gov
 (703)292-0000
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2018
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $336,873.00
Total Awarded Amount to Date: $336,873.00
Funds Obligated to Date: FY 2018 = $336,873.00
History of Investigator:
  • Tao Xie (Principal Investigator)
    txie@sdsu.edu
Recipient Sponsored Research Office: San Diego State University Foundation
5250 CAMPANILE DR
SAN DIEGO
CA  US  92182-1901
(619)594-5731
Sponsor Congressional District: 51
Primary Place of Performance: San Diego State University
5500 Campanile Drive
San Diego
CA  US  92182-7455
Primary Place of Performance
Congressional District:
51
Unique Entity Identifier (UEI): H59JKGFZKHL7
Parent UEI: H59JKGFZKHL7
NSF Program(s): CSR-Computer Systems Research
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923
Program Element Code(s): 735400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

A file system is a computer software module, which is in charge of how files are named, stored, and retrieved. Existing memory systems are based on dynamic random-access memory (DRAM), which is reaching its density and power ceiling. Emerging persistent memory technologies like phase change memory (PCM) not only provide a denser energy-efficient alternative to DRAM, but also allow file systems to be built atop them. This project will contribute to memory and storage technologies by developing a new single-level persistent memory architecture and a new file system dedicated to it.

The new architecture will turn a small-size DRAM-based main memory system to a large-capacity persistent memory system that can access files in-place. This project will proceed along two thrusts. First, it will bridge the technology gap in building high-performance persistent memory systems with an in-depth investigation. Second, it will develop the first file system devoted to a single-level persistent store to efficiently managing data, which is increasingly demanded by data-intensive applications.

This project will benefit society by developing high-performance memory systems that will significantly improve the performance and energy-efficiency of future big data applications, which are revolutionizing all aspects of human lives ranging from enterprises to consumers, from science to government. In the long term, techniques developed in this project will be transferable to servers/clusters and even to large-scale distributed storage systems for big data applications, where performance requirements are more stringent. This project will also promote teaching, learning, and training by exposing students to technological and scientific underpinnings in the field of big data storage systems.

The project outcomes including papers published, technical reports, presentations, course modules, and a repository of the software code will be made available for free download at the address http://taoxie.sdsu.edu/, where they will be kept for ten years. The source code of the new persistent file system will also be posted on GitHub.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pan, Wen and Xie, Tao and Song, Xiaojia "HART: A Concurrent Hash-Assisted Radix Tree for DRAM-PM Hybrid Memory Systems" International Parallel and Distributed Processing Symposium , 2019 10.1109/IPDPS.2019.00100 Citation Details
Song, Xiaojia and Xie, Tao and Fischer, Stephen "Accelerating kNN Search in High Dimensional Datasets on FPGA by Reducing External Memory Access" Future generation computer systems , 2022 https://doi.org/10.1016/j.future.2022.07.009 Citation Details
Song, Xiaojia and Xie, Tao and Fischer, Stephen "A Memory-Access-Efficient Adaptive Implementation of kNN on FPGA through HLS" IEEE International Conference on Computer Design , 2019 10.1109/ICCD46524.2019.00030 Citation Details
Song, Xiaojia and Xie, Tao and Fischer, Stephen "Two Reconfigurable NDP Servers: Understanding the Impact of Near-Data Processing on Data Center Applications" ACM Transactions on Storage , v.17 , 2021 https://doi.org/10.1145/3460201 Citation Details
Xiaojia Song, Tao Xie "A Near-Data Processing Server Architecture and Its impact on Data Center applications" The ISC High Performance , 2019 10.1007/978-3-030-20656-7_5 Citation Details
Zhang, Jian and Xie, Tao and Jing, Yuzhuo and Song, Yanjie and Hu, Guanzhou and Chen, Si and Yin, Shu "BORA: A Bag Optimizer for Robotic Analysis" The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 2020) , 2020 https://doi.org/10.1109/SC41405.2020.00016 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Emerging byte-addressable persistent memory (PM) technologies like 3D XPoint exhibit a huge potential to become a viable DRAM (dynamic random access memory) alternative for scalable main memories. Compared with traditional DRAM, they possess several desirable features such as a much larger capacity and higher energy-efficiency.  Besides, with the advent of a big data era, delivery and analysis of large data sets are often required by data-intensive applications like DNA sequencing and geographic information system. As a result, the conventional compute-centric computing framework is becoming increasingly inadequate as delivering these large data sets all the way from an external storage system to host CPUs (central processing units) incurs substantial data transfer latency and energy consumption. To address these challenges, near-data processing (NDP) is proposed to move computation to data.

To tap the potential of PM technologies and NDP, this project developed an array of new software techniques and tools, which include: (1) a new file system that is completely decoupled from the conventional DRAM-based volatile main memory; (2) two new indexing data structures that run on persistent memory; (3) two new reconfigurable NDP-powered servers; (4) two new kNN (k-Nearest Neighbor) kernels on FPGA (Field Programmable Gate Arrays); and (5) a prototype of a file system middleware.

A new persistent file system called SPFS (Single-level Persistent File System) was developed. It completely bypasses conventional DRAM-based volatile main memory. Unlike all existing PM-oriented file systems, SPFS does not leverage DRAM to manage its metadata. SPFS outperforms traditional DRAM-based in-memory file systems ramfs and tmpfs in most cases. A concurrent and persistent data indexing tree called HART (Hash-assisted Adaptive Radix Tree) was developed. In most cases, HART significantly outperforms WOART (Write Optimal Adaptive Radix Tree) and FPTree (Fingerprinting Persistent Tree), two state-of-the-art persistent trees. Also, it scales well in concurrent scenarios. A persistent dynamic hashing scheme was also developed for persistent memory. It exhibits good performance, high scalability, and quick recovery.

Two reconfigurable NDP servers named RANS (Reconfigurable ARM-based NDP Server) and RFNS (Reconfigurable FPGA-based NDP Server) were developed. Several new findings were obtained, which shed light on how to apply NDP in data centers. For example, we found that while RANS can only benefit data-intensive applications, RFNS can offer benefits for both data-intensive and compute-intensive applications. Moreover, we found that for certain applications the reconfigurability of RANS/RFNS can deliver a noticeable energy efficiency without any performance degradation.

To demonstrate how to achieve NDP by using a hardware accelerator such as FPGA, we implemented two kNN kernels on FPGA: MBFS-kNN (Memory-efficient Brute-Force Searching kNN) and MPCAF-kNN (Memory-efficient Principal Component Analysis based Filtering kNN). The two kernels are adaptive to all key parameters. By comparing them with two cutting-edge kNN implementations on a high-end CPU server, an existing BFS-kNN kernel on FPGA, and an existing BFS-kNN kernel on GPU (Graphics Processing Unit), our experimental results show that the two kernels substantially improve the performance by greatly reducing external memory-accesses. We also developed a file system middleware that optimizes the acquisition of bags, which are specially formatted files used to store timestamped ROS (robot operating system) messages. 

 


Last Modified: 10/26/2023
Modified by: Tao Xie

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page