NSF Award Search: Award # 1618923

Award Abstract # 1618923

CSR: Small: Elastic and Robust Cloud Programming

NSF Org:	CNS Division Of Computer and Network Systems
Recipient:	PURDUE UNIVERSITY
Initial Amendment Date:	August 11, 2016
Latest Amendment Date:	March 3, 2017
Award Number:	1618923
Award Instrument:	Standard Grant
Program Manager:	Marilyn McClure mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2016
End Date:	September 30, 2021 (Estimated)
Total Intended Award Amount:	$485,504.00
Total Awarded Amount to Date:	$485,504.00
Funds Obligated to Date:	FY 2016 = $485,504.00
History of Investigator:	Xiangyu Zhang (Principal Investigator) Patrick Eugster (Co-Principal Investigator) Srivatsan Ravi (Co-Principal Investigator) Patrick Eugster (Former Principal Investigator)
Recipient Sponsored Research Office:	Purdue University 2550 NORTHWESTERN AVE # 1100 WEST LAFAYETTE IN US 47906-1332 (765)494-1055
Sponsor Congressional District:	04
Primary Place of Performance:	Purdue University IN US 47907-2107
Primary Place of Performance Congressional District:	04
Unique Entity Identifier (UEI):	YRXVL4JYCEF5
Parent UEI:	YRXVL4JYCEF5
NSF Program(s):	CSR-Computer Systems Research
Primary Program Source:	01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7923
Program Element Code(s):	735400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

The emergence of cloud computing is undoubtedly one of the major paradigm shifts of the last decade in information technology, and one with substantial economic impact. Indeed, the ability to rent computing resources on a need basis (as opposed to acquiring and managing infrastructure provisioned for peak work loads that may occur only rarely) supports many businesses of different kinds and sizes. However, while cloud infrastructures allow computing resources to be allocated and released very dynamically, developing software that leverages this potential to automatically adjust its usage of resources to its workload (e.g., the number of client connections) and performance goals at runtime is a hard task for software engineers. The goal of this project is thus to provide programmers support in the form of a programming model and runtime environment for developing such elastic applications.

Devising such a generic programming model is however very challenging as it must reconcile simplicity (for programmers) with scalability (by facilitating parallelism and distribution) and robustness (by handling partial failures). Unfortunately, these properties may conflict. This project addresses the challenges through the following contributions. (1) Programming model and language: a novel object-oriented programming model variant called Atomic Events and Ownership Network (AEON) is proposed. AEON combines a simplified object model to reason about units of application state with a novel type of multiple ownership to streamline interaction between these units, and a novel notion of events for atomic client-server interaction. (2) Distributed runtime environment: a highly scalable and decentralized runtime environment for AEON is implemented, with support for dynamically adding and removing computational units, as well as for supporting the restructuring of their relationships without hampering consistency or conversely stalling progress. Heuristics to efficiently (re-)partition AEON applications are also proposed. (3) Resource management and fault tolerance: a resource management framework is leveraged for facilitating the mapping between application units and underlying resources; it is augmented to provide a notion of dependable resources achieving fault tolerance. (4) Evaluation: the developed support is evaluated on a wide variety of applications and across different cloud infrastructures. All developments are based on open-source software.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

B. Abusalah, D. Schatzlein, J. J. Stephen, M. Saeida Ardekani, and P. Eugster "Dependable Cloud Resources with Guardian" 37th IEEE International Conference on Distributed Computing Systems (ICDCS 2017) , 2017

Bara Abusalah; Derek Schatzlein; Julian James Stephen; Masoud Saeida Ardekani; Patrick Eugster "Dependable Cloud Resources with Guardian" 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) , 2017 10.1109/ICDCS.2017.158

Bo Sang, Gustavo Petri, Masoud Saeida Ardekani, Srivatsan Ravi, Patrick Eugster "Programming Scalable Cloud Services with AEON" Middleware '16: Proceedings of the 17th International Middleware Conference , 2016 10.1145/2988336.2988352

Bo Sang, Patrick Eugster, Gustavo Petri, Srivatsan Ravi, and Pierre-Louis Roman "Scalable and Serializable Networked Multi-Actor Programming" 2020 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH 2020) , 2020

Bo Sang, Pierre-Louis Roman, Patrick Eugster, Hui Lu, Srivatsan Ravi, and Gustavo Petri "PLASMA: programmable elasticity for stateful cloud computing applications" 15th ACM European Conference on Computer Systems (EuroSys 2020) , 2020

Bo Sang, Pierre-Louis Roman, Patrick Eugster, Hui Lu, Srivatsan Ravi, and Gustavo Petri "PLASMA: programmable elasticity for stateful cloud computing applications." 15th ACM European Conference on Computer Systems (EuroSys 2020) , 2020 10.1145/3342195.3387553

Bo Sang, Srivatsan Ravi, Gustavo Petri, Mahsa Najafzadeh, Masoud Saeida Ardekani, Patrick Eugster "Programmable Elasticity for Actor-based Cloud Applications" PLOS'17: Proceedings of the 9th Workshop on Programming Languages and Operating Systems , 2017 10.1145/3144555.3144558

B. Sang, G. Petri, M. Saeida Ardekani, S. Ravi, and P. Eugster "Programming Scalable Cloud Services with AEON" 17th ACM / IFIP / USENIX International Middleware Conference (Middleware 2016) , 2016 , p.16

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project is concerned with supporting elasticity and fault tolerance of applications executing in third-party cloud data centers. The main tenets are to shield the programmer as much as possible from explictly programming applications in a way using specific mechanism to achieve elasticity as well as fault tolerance, but instead proposing malleable systems that use at most high-level policies and simple configuration to achieve best possible performance.

Concretely, the main outcomes of the project are three-fold: 1. A programming language based on the popular actor model that leverages ownership and topological constraints observed by many common elastic applications to automatically achieve consistency among events executing concurrently across multiple actors in a serializable way. The language is shown to achieve much better performance compared to other approaches achieving comparable consistency guarantees across a large number of relevant benchmark applications. 2. A policy language for specifying high-level elasticitly/scalability constraints for programs written in our language (1.) which allows the runtime environment to autonomously place and migrate actors for best performance. The policy language is shown to allow for easily saving 25% of resources for running with same performance, or achieving 20% better performance with the same amount of resources, compared to prior simpler approaches of achieving elasticity. 3. A resource management system that achieves fault tolerance of resources via largely automated replication, using several heuristics for avoiding exorbitant overheads through naive replication of every resource component; the system can be configured both both batch and continuous processing applications. The replication is shown to incur a runtime overhead as low as 6%, while achieving up to 68% faster completion times in the presence of failures.

Last Modified: 11/07/2021
Modified by: Patrick T Eugster

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error