
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | September 2, 2016 |
Latest Amendment Date: | September 2, 2016 |
Award Number: | 1629397 |
Award Instrument: | Standard Grant |
Program Manager: |
Marilyn McClure
mmcclure@nsf.gov (703)292-5197 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | August 31, 2020 (Estimated) |
Total Intended Award Amount: | $825,000.00 |
Total Awarded Amount to Date: | $825,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1109 GEDDES AVE STE 3300 ANN ARBOR MI US 48109-1015 (734)763-6438 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
2260 Hayward Ann Arbor MI US 48109-2121 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Exploiting Parallel&Scalabilty |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Although many modern applications, e.g., exploratory analytics and scientific visualization, come with stringent latency requirements, today's in-memory and scale-out solutions often provide only best-effort services. A root cause of unpredictability lies in the traditional design principle of minimizing I/O operations. With the advent of faster storage and networks in rack-scale computing, however, I/O may no longer be scarce anymore. This project revisits the tradeoffs and design principles of scale-out, low-latency applications in this emerging context. Bounded response times will reduce over-provisioning and foster new applications (e.g., business intelligence, robotics, and intensive care units) that require consistent performance. Project findings will be integrated into undergraduate and graduate curricula, and software artifacts will be open-sourced for the wider community across academia and industry.
This project aims to leverage the influx of new hardware capabilities to enable applications based on bounded response times as their primary design criteria. Specifically, the project leverages approximation, speculation, and scheduling to mask variabilities in latency-sensitive applications. The key technical challenge in realizing this vision lie in making a set of tradeoffs different from the norm: (i) rather than striving for less I/O, this project trades I/O off for better memory locality and aggressively speculate to reduce response times; (ii) when needed, it resorts to approximation techniques for bounded response times; and finally, (iii) it develops new approximation- and speculation-aware schedulers to increase resource efficiency. The project also investigates theoretical and empirical boundaries of approximate and speculative processing as well as new spatiotemporal scheduling techniques in rack-scale computing.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Although modern applications come with stringent performance requirements, existing solutions often provide only best-effort services. A root cause of unpredictability lies in the traditional design principle of minimizing I/O operations. However, with the advent of faster storage and networking hardware, I/O capacity is not as scarce anymore. The overarching goal of this project was to rethink the tradeoffs and design principles of modern applications in this emerging context. To this end, we built a set of solutions that married advances in hardware capabilities with battle-tested software optimization techniques to enable resource disaggregation for big data and AI/ML workloads.
To enable efficient and resilient memory disaggregation over fast networks, we created the first practical memory disaggregation solution (Infiniswap) as part of this project. We made it resilient without incurring large memory overhead by designing a erasure-coded memory solution (Hydra), and we enabled locking using RDMA primitives (DSLR) to enable concurrent accesses to remote memory objects. Overall, our solutions took the first steps toward practical memory disaggregation to the point that memory-intensive applications can run without any performance loss even when 50% of their memory resides in remote machines.
We also focused on high-performance big data analytics by enabling so-called infinite-scale analytics (VertictDB), whereby any existing analytics engine can leverage approximate query processing to speed up their performance by 57X on average (and up to 841X). We also designed new cluster scheduler (Carbyne) that can take the DAG of a job and altruistically exchange resources with other jobs to improve the average job completion times. In deployments, Carbyne provides 1.26X better effi?ciency and 1.59X lower average completion time than the state-of-the-art, while ensuring fair resource sharing.
Another key direction we explored is resource management in AI/ML clusters. To this end, we worked on GPU cluster management (Tiresias) and GPU resource management (Salus) for training as well as both for hyperparameter tuning (FluidExec). In addition, we looked beyond GPUs to optimize CPU resource management in distributed AI training, especially in the parameter server setting. Overall, our solutions resulted in up to 5.5X cluster-level improvement and 7X improvement at the individual GPU level resource usage efficiency, reducing the cost of AI for the masses.
Finally, from a theoretical advances perspective, we have explored several techniques to improve approximate query processing in the context of maximum inner-product search (BOUNDEDME) and joins on sampled data (SUBS), improving by an order of magnitude over the state-of-the-art techniques. At the same time, we have made progress on the learning theory side by enabling projection-free optimization and selectivity learning with mixture models (QuickSel). QuickSel is 34.0X?179.4X faster than stateof-the-art query-driven techniques for selectivity learning.
All software developed as part of this project are based on established open-source systems such as Apache Spark, Apache YARN, TensorFlow, and MySQL, and we have and continue to open-source our works at https://github.com/symbioticlab. Research papers summarizing our works have been published or are under submission in top venues in networking, systems, databases, and AI including OSDI, NSDI, SIGMOD, VLDB, and AAAI. Some of the works have been incorporated into course contents in graduate- and undergraduate-level networking and databases courses at the University of Michigan. Last but not the least, several PhD students at the University of Michigan have worked on different pieces of our contributions, and this grant has helped in partly supporting their education and training.
Last Modified: 12/02/2020
Modified by: Mosharaf Chowdhury
Please report errors in award information by writing to: awardsearch@nsf.gov.