
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | August 26, 2013 |
Latest Amendment Date: | August 26, 2013 |
Award Number: | 1320226 |
Award Instrument: | Standard Grant |
Program Manager: |
Marilyn McClure
mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2013 |
End Date: | August 31, 2017 (Estimated) |
Total Intended Award Amount: | $407,968.00 |
Total Awarded Amount to Date: | $407,968.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1918 F ST NW WASHINGTON DC US 20052-0042 (202)994-0728 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
DC US 20052-0058 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | CSR-Computer Systems Research |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Despite a projected shift to cloud computing, heightened concerns over cloud reliability remain paramount in both private and government sectors, and urge innovative solutions to meet the growing challenge of disparate reliability requirements. While existing techniques allow cloud providers to offer some fixed level of reliability to all customers, it may be either inadequate or too expensive to fit their specific requirements. This project aims to develop a novel framework for providing reliability as an elastic, transparent service that can be customized and accessed by all customers in cloud computing.
The goals of this project are: (1) holistic integration of two reliability approaches (viz., checkpointing and replication) with utility optimization and their adaptation to a distributed cloud environment with heterogeneous user demands, (2) the development of pricing schemes for cloud providers to put their ?resource white spaces? to profitable use. These two research directions collaboratively enable the realization of Reliability as a Service (RaaS). With the introduction of pay-per-use reliability services, cloud customers could choose reliability components they require on a feature-by-feature basis. Achieving a desired reliability level could be a single check box away. For cloud service providers, RaaS presents an additional source of revenue and value to their services.
By constructing realistic models and developing algorithms for resource allocation and optimization and pricing, the proposed research is expected to advance the start of the art of cloud computing. The project also includes an implementation and experimental component that will yield valuable knowledge on best practices and the main obstacles towards transitioning the results into the commercial world. This project will also carry out a number of educational activities involving K-12, undergraduate, and graduate students, and make strong outreach efforts for recruiting and mentoring under-represented students.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project created a new framework to enable Reliability as a Service (RaaS) in cloud computing. It harnessed checkpointing and replication techniques with utility optimization and dynamic cloud resource management to provide reliability as an elastic service, where flexible service-level agreements (SLAs) are negotiated through a joint assessment of users' reliability demands and total cloud resources available in a data center. A holistic RaaS framework that jointly optimizes reliability and cost/pricing over a number of entangled “control knobs”: reliability, checkpointing schedule, data replication factor, bandwidth allocation, dynamic scheduling of tasks/requests, storage/execution cost, latency and data locality, has been developed, providing an additional source of revenue to cloud providers by exploiting under-utilized resources and offering RaaS to cloud costumers. In solving these problems, the project developed novel models for service reliability, speculative execution, replication/erasure coding, and storage service latency, as well as new distributed algorithms for the proposed RaaS optimization.
The project also investigated the practical and systems aspects of RaaS and utility-based optimization. In particular, the proposed RaaS framework and optimization algorithms haven been prototyped and integrated with several popular cloud and distributed computing systems, such as Amazon EC2, MapReduce, Tahoe, Ceph, and Cassandra. It resulted in a number of resource managers and task schedulers, which jointly optimizes reliability and performance metrics. Our evaluation using real-world workload validates significant reliability improvement on these systems and demonstrated the ability to provide elastic reliability that fits individual application’s requirements.
The results of the project were published at peer-reviewed conferences; the source code of resulting tool, software and hardware design has been made openly available online. By jointly optimizing reliability, performance, and cost objectives, the resulting technologies will not only lead to new cloud infrastructure and management algorithms, but also promote the awareness of reliability and new practices such as usage-based RaaS through pricing and new business models. As cloud and distributed computing has become an important way for delivering network-based services, especially those from underserved communities and developing regions, to access information technology, this project will have a broader impact on the global society and economy. Notably technologies resulting from this project apply to not only mobile devices but also edge computing and mobile networks. Inspired by this research, new teaching lab facilities and interdisciplinary curriculum modules for teaching both the theory and systems have been developed.
Last Modified: 10/30/2017
Modified by: Tian Lan
Please report errors in award information by writing to: awardsearch@nsf.gov.