NSF Award Search: Award # 1816611 - CSR: Small: Cost-Aware Cloud Profiling, Prediction, and Provisioning as a Service

Award Abstract # 1816611

CSR: Small: Cost-Aware Cloud Profiling, Prediction, and Provisioning as a Service

NSF Org:	CNS Division Of Computer and Network Systems
Recipient:	UNIVERSITY OF CHICAGO
Initial Amendment Date:	August 30, 2018
Latest Amendment Date:	August 30, 2018
Award Number:	1816611
Award Instrument:	Standard Grant
Program Manager:	Marilyn McClure mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2018
End Date:	September 30, 2023 (Estimated)
Total Intended Award Amount:	$500,000.00
Total Awarded Amount to Date:	$500,000.00
Funds Obligated to Date:	FY 2018 = $500,000.00
History of Investigator:	Kyle Chard (Principal Investigator) chard@uchicago.edu Ian Foster (Co-Principal Investigator)
Recipient Sponsored Research Office:	University of Chicago 5801 S ELLIS AVE CHICAGO IL US 60637-5418 (773)702-8669
Sponsor Congressional District:	01
Primary Place of Performance:	University of Chicago Chicago IL US 60637-5418
Primary Place of Performance Congressional District:	01
Unique Entity Identifier (UEI):	ZUE9HKT2CLC9
Parent UEI:	ZUE9HKT2CLC9
NSF Program(s):	CSR-Computer Systems Research
Primary Program Source:	01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7923
Program Element Code(s):	735400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Cloud computing has the potential to transform computational practice by enabling immediate, on-demand access to large-scale computing resources. But large-scale cloud computing can easily be costly. The Scalable Cost-Aware Cloud Infrastructure Management and Provisioning (SCRIMP) project aims to develop new cloud access methods that will reduce the complexity and cost and improve the efficiency of using cloud resources. The project will innovate in three areas: profiling, prediction, and provisioning. Its new machine learning-based profiling techniques aim to predict application performance, at different levels of accuracy, across a diverse set of cloud resources, based upon derivation of comparable and related instance classes, explorative profiling techniques, and analysis of historical usage. Its ensemble-based market prediction models will allow the many existing cloud market prediction models to be easily compared and then combined so that their collective strengths can be used to predict costs with the aim of minimizing cost, price risk, and likelihood of instance revocation. Finally, its overarching provisioning model will combine application profiles and market prediction models to enable automated, cost-efficient, policy-based cloud provisioning as well as efficient placement and migration of workload within the resulting dynamically provisioned environment.

SCRIMP will advance the use of computation across the sciences, particularly within smaller institutions, by simplifying access to on-demand cloud computing and improving the efficiency with which researchers make use of cloud infrastructure. By lowering scientific computing costs and complexity for many users, SCRIMP will enable more efficient use of cloud credits (whether from cloud providers or funding agencies), democratize access to cloud computing by researchers without dedicated computing infrastructure or expertise, and allow researchers and students to conduct increasingly complex analytics, on larger datasets, and at higher resolution. SCRIMP will also be directly relevant in education, allowing educators to provide access to large resource pools at low cost with guaranteed performance.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 12)

Show All

Baughman, Matt and Caton, Simon and Haas, Christian and Chard, Ryan and Wolski, Rich and Foster, Ian and Chard, Kyle "Deconstructing the 2017 Changes to AWS Spot Market Pricing" 10th Workshop on Scientific Cloud Computing , 2019 10.1145/3322795.3331465 Citation Details

Baughman, Matt and Chakubaji, Nifesh and Truong, Hong-Linh and Kreics, Krists and Chard, Kyle and Foster, Ian "Measuring, Quantifying, and Predicting the Cost-Accuracy Tradeoff" IEEE International Conference on Big Data (Big Data) , 2019 https://doi.org/10.1109/BigData47090.2019.9006370 Citation Details

Baughman, Matt and Chard, Ryan and Ward, Logan and Pitt, Jason and Chard, Kyle and Foster, Ian "Profiling and Predicting Application Performance on the Cloud" 11th IEEE/ACM International Conference on Utility and Cloud Computing (UCC) , 2018 10.1109/UCC.2018.00011 Citation Details

Baughman, Matt and Foster, Ian and Chard, Kyle "Enhancing Automated FaaS with Cost-aware Provisioning of Cloud Resources" 2021 IEEE 17th International Conference on eScience (eScience) , 2021 https://doi.org/10.1109/eScience51609.2021.00053 Citation Details

Baughman, Matt and Foster, Ian and Chard, Kyle "Exploring Tradeoffs in Federated Learning on Serverless Computing Architectures" IEEE 18th International Conference on e-Science (e-Science) , 2022 https://doi.org/10.1109/eScience55777.2022.00074 Citation Details

Baughman, Matt and Hudson, Nathaniel and Chard, Ryan and Bauer, Andre and Foster, Ian and Chard, Kyle "Tournament-Based Pretraining to Accelerate Federated Learning" SC'23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis , 2023 https://doi.org/10.1145/3624062.3626089 Citation Details

Baughman, Matt and Hudson, Nathaniel and Foster, Ian and Chard, Kyle "Balancing Federated Learning Trade-Offs for Heterogeneous Environments" IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) , 2023 https://doi.org/10.1109/PerComWorkshops56833.2023.10150228 Citation Details

Baughman, Matt and Kumar, Rohan and Foster, Ian and Chard, Kyle "Expanding Cost-Aware Function Execution with Multidimensional Notions of Cost" Proceedings of the 1st Workshop on High Performance Serverless Computing , 2020 https://doi.org/10.1145/3452413.3464790 Citation Details

Caton, Simon and Baughman, Matt and Haas, Christian and Chard, Ryan and Foster, Ian and Chard, Kyle "Assessing the Current State of AWS Spot Market Forecastability" IEEE/ACM International Workshop on Interoperability of Supercomputing and Cloud Technologies (SuperCompCloud) , 2022 https://doi.org/10.1109/SuperCompCloud56703.2022.00007 Citation Details

Kotsehub, Nikita and Baughman, Matt and Chard, Ryan and Hudson, Nathaniel and Patros, Panos and Rana, Omer and Foster, Ian and Chard, Kyle "FLoX: Federated Learning with FaaS at the Edge" 18th International Conference on e-Science (e-Science) , 2022 https://doi.org/10.1109/eScience55777.2022.00016 Citation Details

Kumar, Rohan and Baughman, Matt and Chard, Ryan and Li, Zhuozhao and Babuji, Yadu and Foster, Ian and Chard, Kyle "Coding the Computing Continuum: Fluid Function Execution in Heterogeneous Computing Environments" 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) , 2021 https://doi.org/10.1109/IPDPSW52791.2021.00018 Citation Details

(Showing: 1 - 10 of 12)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The Scalable Cost-Aware Cloud Infrastructure Management and Provisioning (SCRIMP) project developed new methods and open-source software for profiling, predicting, and provisioning cloud resources. The project created new methods for profiling application performance across diverse cloud resources, new techniques for predicting cloud costs in preemptible and volatile cloud markets, and developed a user-facing provisioning service that leverages profiling and prediction models to enable users to quantify, balance, and optimize tradeoffs in terms of application performance (e.g., accuracy) and costs (e.g., time, budget, and energy consumption).

The SCRIMP profiling framework is designed to automatically and efficiently profile application performance on different cloud and edge resources. The framework is able to capture a multidimensional notion of cost, for example, cloud computing costs, startup costs, data movement costs, and computing resources used. It also considers various measures of performance, such as result quality or application accuracy. Such flexibility is crucial, for example, in machine learning training, as the profiling framework can capture model accuracy and thus enable downstream exploration of the tradeoff between training time and model accuracy. The profiling framework can automatically explore wide input parameter spaces for applications, investigating different parameterizations to estimate performance and costs. The profiling framework uses active learning/experiment design methods to reduce profiling costs via targeted samples across the search space. Experiment results show that the profiling framework can accurately predict costs on heterogeneous distributed resources.

The project created new methods to predict cloud computing costs based on analysis and modeling of cloud markets. Analysis of AWS spot market dynamics contributed to understanding of the “predictability” of the spot market over a long period of time. This analysis highlighted important changes to the market that altered the prediction dynamics, and ultimately showed that recent changes dramatically simplified the market making it far less dynamic than it was previously. As a result, even simple statistical methods can accurately forecast prices into the future. The project developed and evaluated neural network and statistical models to forecast prices. Evaluation showed that even ARIMA models achieved an order of magnitude improvement in accuracy when applied to recent spot market data compared to historical data. These prediction methods make it easy to forecast future prices and when used with real applications can significantly reduce costs and resource waste.

Finally, the project created a new service called DELTA that combines these profiling and prediction methods to automatically provision and schedule execution of user workloads across distributed and heterogeneous resources. DELTA enables applications to be executed without requiring users to consider the complex costs that exist in cloud and edge systems (e.g., provisioning delays, transfer costs, container deployment time). DELTA implements an extensible architecture in which different predictors and scheduling algorithms can be integrated to provide dynamically evolving estimates of costs on different resources. These estimates can be used to determine the most appropriate location for execution. Experiments across diverse resources, including cloud and edge devices, showed that DELTA can significantly reduce workload makespan when compared with a strategy that selects the fastest resource.

Last Modified: 02/28/2024
Modified by: Kyle Chard

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error