Skip to feedback

Award Abstract # 1845853
CAREER: End-to-End Network Design for Unified Memory Disaggregation

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: REGENTS OF THE UNIVERSITY OF MICHIGAN
Initial Amendment Date: March 1, 2019
Latest Amendment Date: May 23, 2023
Award Number: 1845853
Award Instrument: Continuing Grant
Program Manager: Deepankar Medhi
dmedhi@nsf.gov
 (703)292-2935
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2019
End Date: June 30, 2025 (Estimated)
Total Intended Award Amount: $578,228.00
Total Awarded Amount to Date: $578,228.00
Funds Obligated to Date: FY 2019 = $117,330.00
FY 2020 = $110,007.00

FY 2021 = $113,406.00

FY 2022 = $116,923.00

FY 2023 = $120,562.00
History of Investigator:
  • Mosharaf Chowdhury (Principal Investigator)
    mosharaf@umich.edu
Recipient Sponsored Research Office: Regents of the University of Michigan - Ann Arbor
1109 GEDDES AVE STE 3300
ANN ARBOR
MI  US  48109-1015
(734)763-6438
Sponsor Congressional District: 06
Primary Place of Performance: Regents of the University of Michigan
3003 S. State St
Ann Arbor
MI  US  48109-1274
Primary Place of Performance
Congressional District:
06
Unique Entity Identifier (UEI): GNJ7BBP73WE9
Parent UEI:
NSF Program(s): Networking Technology and Syst
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT

01002223DB NSF RESEARCH & RELATED ACTIVIT

01002324DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045
Program Element Code(s): 736300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Applications in modern cloud datacenters are deployed in resource containers to isolate them from each other. Memory stranding is a pervasive problem in such containerized datacenters, where many memory-intensive applications grind to a halt even when free memory exists in other machines. This leads to low utilization, memory fragmentation, and overall increased cost. Memory disaggregation over ultra-fast networks can pool together such stranded memory in theory, but making it practical faces novel systems design, algorithmic, and integration challenges. They include bridging the still-sizable latency gap between local memory access vs. Remote Direct Memory Access (RDMA), transparently addressing network-wide fault-tolerance, load imbalance, and performance isolation issues, scalability, and enabling support for heterogeneous software and hardware technologies.

The overarching research objective of this proposal is to realize a Unified Disaggregated Memory (UDM) abstraction over ultra-fast networks to expose stranded memory across the datacenter as a pool of available memory to out-of-memory containers in a fast, resilient, and scalable manner without any changes to the applications. By designing a comprehensive solution to address host-level, network-level, and end-to-end aspects of the aforementioned challenges, this research aims to make memory disaggregation practical. Specifically, by leveraging the unique characteristics of memory-intensive workloads, ultra-low-latency networks, and multi-tenancy in modern datacenters, this proposal will (i) design a low-latency host networking stack; (ii) enable performance isolation throughout the network; (iii) provide resilience to network-wide uncertainties such as failures and load imbalance; and (iv) incorporate support for heterogeneous memory (e.g., persistent memory), networking technologies, and resource management software.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 12)
Al_Maruf, Hasan and Chowdhury, Mosharaf "Memory Disaggregation: Advances and Open Challenges" ACM SIGOPS Operating Systems Review , v.57 , 2023 https://doi.org/10.1145/3606557.3606562 Citation Details
Lee, Y. and Maruf, H. A. and Cidon, A. and Chowdhury, M. and Shin, K. G. "Hydra: Resilient and Highly Available Remote Memory" USENIX FAST , 2022 Citation Details
Maruf, Hasan Al and Chowdhury, Mosharaf "Effectively Prefetching Remote Memory with Leap" 2020 USENIX Annual Technical Conference, USENIX ATC 2020 , 2020 Citation Details
Maruf, Hasan Al and Wang, Hao and Dhanotia, Abhishek and Weiner, Johannes and Agarwal, Niket and Bhattacharya, Pallab and Petersen, Chris and Chowdhury, Mosharaf and Kanaujia, Shobhit and Chauhan, Prakash "TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory" ACM ASPLOS , 2023 https://doi.org/10.1145/3582016.3582063 Citation Details
Maruf, Hasan Al and Zhong, Yuhong and Wang, Hongyi and Chowdhury, Mosharaf and Cidon, Asaf and Waldspurger, Carl "Memtrade: Marketplace for Disaggregated Memory Clouds" Proceedings of the ACM on Measurement and Analysis of Computing Systems , v.7 , 2023 https://doi.org/10.1145/3589985 Citation Details
You, Jie and Wu, Jingfeng and Jin, Xin and Chowdhury, Mosharaf "Ship Compute or Ship Data? Why Not Both?" 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI'21) , 2021 Citation Details
Yu, Zhuolong and Hu, Chuheng and Wu, Jingfeng and Sun, Xiao and Braverman, Vladimir and Chowdhury, Mosharaf and Liu, Zhenhua and Jin, Xin "Programmable Packet Scheduling with a Single Queue" ACM SIGCOMM , 2021 https://doi.org/10.1145/3452296.3472887 Citation Details
Yu, Zhuolong and Zhang, Yiwen and Braverman, Vladimir and Chowdhury, Mosharaf and Jin, Xin "NetLock: Fast, Centralized Lock Management Using Programmable Switches" SIGCOMM '20: Proceedings of the 2020 Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication , 2020 10.1145/3387514.3405857 Citation Details
Zhang, Yiwen and Kumar, Gautam and Dukkipati, Nandita and Wu, Xian and Jha, Priyaranjan and Chowdhury, Mosharaf and Vahdat, Amin "Aequitas: Admission Control for Performance-Critical RPCs in Datacenters" ACM SIGCOMM , 2022 https://doi.org/10.1145/3544216.3544271 Citation Details
Zhang, Yiwen and Tan, Yue and Stephens, Brent and Chowdhury, Mosharaf "Justitia: Software Multi-Tenancy in Hardware Kernel-Bypass Networks" USENIX NSDI , 2022 Citation Details
Zhang, Yiwen and Zhang, Xumiao and Ananthanarayanan, Ganesh and Iyer, Anand and Shu, Yuanchao and Bahl, Victor and Mao, Z Morley and Chowdhury, Mosharaf "Vulcan: Automatic Query Planning for Live ML Analytics" , 2024 Citation Details
(Showing: 1 - 10 of 12)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page