NSF Award Search: Award # 1255425

Award Abstract # 1255425

Collaborative Research: Resource Allocation in Clouds: A Stochastic Modeling and Control Perspective

NSF Org:	ECCS Division of Electrical, Communications and Cyber Systems
Recipient:	ARIZONA STATE UNIVERSITY
Initial Amendment Date:	August 28, 2012
Latest Amendment Date:	August 28, 2012
Award Number:	1255425
Award Instrument:	Standard Grant
Program Manager:	Radhakisan Baheti ECCS Division of Electrical, Communications and Cyber Systems ENG Directorate for Engineering
Start Date:	August 16, 2012
End Date:	June 30, 2016 (Estimated)
Total Intended Award Amount:	$224,928.00
Total Awarded Amount to Date:	$224,928.00
Funds Obligated to Date:	FY 2012 = $224,928.00
History of Investigator:	Lei Ying (Principal Investigator) leiying@umich.edu
Recipient Sponsored Research Office:	Arizona State University 660 S MILL AVENUE STE 204 TEMPE AZ US 85281-3670 (480)965-5479
Sponsor Congressional District:	04
Primary Place of Performance:	Arizona State University AZ US 85281-6011
Primary Place of Performance Congressional District:	04
Unique Entity Identifier (UEI):	NTLHJXM55KZ6
Parent UEI:
NSF Program(s):	EPCN-Energy-Power-Ctrl-Netwrks
Primary Program Source:	01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	092E
Program Element Code(s):	760700
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.041

ABSTRACT

Cloud computing services (such as Amazon EC2 system, Google AppEngine, and Microsoft Azure) are becoming ubiquitous and are starting to serve as the primary source of computing power for both enterprises and personal computing applications. A cloud computing platform (or simply, a cloud) can provide a variety of resources, including infrastructure, software, and services, to users in an on-demand fashion. Compared to traditional own-and-use approaches, cloud computing services eliminate the costs of purchasing and maintaining the infrastructures for cloud users, and allow the users to dynamically scale up and down computing resources in real time based on their needs.

A cloud consists of a number of machines (computers), each with a certain amount of resources (CPU, RAM, hard disk space, etc.). Each machine can be subdivided into virtual machines, where each virtual machine (VM) behaves like a small machine with a certain amount of dedicated resources. When a user submits a job to the cloud, he or she requests a certain amount of resources from the cloud and the cloud responds by creating a VM with the required resources in a machine. The resource allocation problem is to figure out how to allocate jobs to machines. Further, when several jobs are waiting for service, the cloud must also decide which job to select for service next. The goal of this project is to design resource allocation algorithms for efficient operation of the cloud, and to design pricing mechanisms to maximize the cloud service provider revenue while providing good quality of service to competing users.

Intellectual Merit: The prior art in this area is to view the problem as a sequence of static problems as follows: consider the jobs that are currently waiting for service and allocate them to machines by solving a combinatorial optimization problem. Static approaches which ignore the dynamic nature of the system lead to instability. Our viewpoint here is fundamentally different: we consider the resource allocation problem as a dynamic stochastic network control problem. We will use queue length information about waiting jobs as the feedback signal to take resource allocation decisions such as routing jobs to machines and scheduling jobs on machines. To this end, we will answer a number of fundamental questions: what is the stability region of a cloud? ; is there a tradeoff between computational complexity and stability? ; how can we characterize the performance of resource allocation algorithms beyond stability? ; and how should a cloud provider price its resources for maximizing social welfare or profit? From a theoretical perspective, the novelty in the proposed approach lies in the design of control and performance analysis algorithms while taking computational complexity considerations in account.
Broader Impact: The PIs teach graduate-level courses spanning networks, games, control theory, and optimization. We were among the first to incorporate network applications in control courses and control applications in networking classes. The proposed project provides new opportunities for such cross-fertilization by opening up a new application area, namely cloud computing, for control-theoretic methodologies. The PIs have a strong record of advising undergraduate students and graduate students from underrepresented groups. We will continue our recruitment efforts from such student groups for this project also. We will also use specific opportunities for this purpose as applicable, such as the NSF Alliance Graduate Education and the Professoriate (AGEP) program, which is a coordinated effort by Iowa universities to recruit minorities, and the Graduate Minority Assistantship Program (GMAP), which provides funds for recruiting minority students on research assistantships.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Lei Ying "On the Approximation Error of Mean-Field Models" Proc. ACM SIGMETRICS 2016 , 2016 , p.285 10.1145/2896377.2901463

W. Wang, K. Zhu, Lei Ying, J. Tan and L. Zhang "A Throughput Optimal Algorithm for Map Task Scheduling in MapReduce with Data Locality" Performance Evaluation Review , v.4 , 2013

W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang "A throughput optimal algorithm for map task scheduling in mapreduce with data locality" ACM SIGMETRICS Performance Evaluation Review , v.40 , 2013 , p.33-42

W. Wang, L. Ying, and J. Zhang "The Value of Privacy: Strategic Data Subjects, Incentive Mechanisms and Fundamental Limits" Proc. ACM SIGMETRICS 2016 , 2016 , p.249 10.1145/2896377.2901461

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Cloud computing services (such as Amazon EC2 system, Google's AppEngine, and Microsoft's Azure) are starting to serve as the primary source of computing power for both enterprises and personal computing applications. A cloud computing platform can provide a variety of resources, including infrastructure, software, and services, to users in an on-demand fashion. Compared to traditional “own-and-use” approaches, cloud computing services eliminate the costs of purchasing and maintaining the infrastructures for cloud users, and allow the users to dynamically scale up and down computing resources in real time based on their needs.

This project focused on stochastic modeling and the design of high performance resource allocation algorithms. During the course of this project, we developed several stochastic models for resource allocation problems in cloud computing clusters, including virtual machine placement and data parallel computing using MapReduce/Hadoop. A key insight of our designs was to use queue length information about waiting jobs/tasks as the feedback signal to take resource allocation decisions such as routing jobs to machines and scheduling computation/data-processing on machines. Based on the stochastic models, we quantified the fundamental capacity regions of cloud computing clusters. Using the insight of queue-length-based resource allocation, we developed dynamic resource allocation algorithms that achieve the capacity regions for both the virtual machine placement problem and the Map-task scheduling problem in MapReduce/Hadoop. We have also developed a theoretical tool based on the mean-field analysis and Stein’s method to analyze the performance of large-scale computing clusters and to quantify the asymptotic delay performance. By combining Stein’s method and the perturbation theory in control, the tool we developed has been used to quantify the approximation error of the mean-field approximation for finite size systems

The project also informed and educated both students and the broader society with curriculum development on courses on computer networking and outreach. The project also played a role in the integration of under-represented groups to the scientific community. A female Ph.D. student was involved in the project to develop capacity-achieving resource allocation algorithms. One paper that developed a new randomized load balancing algorithm for cloud computing clusters has been awarded the best paper award at INFOCOM 2015.

Last Modified: 08/10/2016
Modified by: Lei Ying

Please report errors in award information by writing to: awardsearch@nsf.gov.