
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | August 14, 2015 |
Latest Amendment Date: | August 14, 2015 |
Award Number: | 1533795 |
Award Instrument: | Standard Grant |
Program Manager: |
Marilyn McClure
mmcclure@nsf.gov (703)292-5197 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2015 |
End Date: | August 31, 2020 (Estimated) |
Total Intended Award Amount: | $299,990.00 |
Total Awarded Amount to Date: | $299,990.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2550 NORTHWESTERN AVE # 1100 WEST LAFAYETTE IN US 47906-1332 (765)494-1055 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
305 N. University Street West Lafayette IN US 47907-2107 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Exploiting Parallel&Scalabilty |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Virtually all distributed computing applications, from transactions on databases to updates on social media platforms, involve concurrent operations on data objects. For these applications, concurrency control mechanisms represent significant performance overheads. These applications typically exhibit strong and persistent patterns in data access. Motivated by the importance of the problem, this project investigates the use of dynamic data- and lock-access patterns in distributed computations to significantly improve the performance of concurrency control mechanisms for scalable systems, specifically, in conventional cloud environments and key-value stores such as BigTable, HBase, and Cassandra. In contrast to conventional techniques that collocate locks with corresponding data items, this project relies on a modular lock service that decouples lock locations from corresponding data objects, and maintains lock state of all data items in a small set of storage nodes.
This design choice motivates a number of questions for this research: (i) where and when should lock states be migrated into the lock service? (ii) when should lock state be repatriated to the data store? (iii) how should the lock service be scaled out? (iv) what are fault-tolerant, low-overhead, deadlock- and livelock-free protocols for these operations? and (v) how can long-lived data access patterns be leveraged in such systems? Building on preliminary results that demonstrate the feasibility and considerable promise of the approach, the project develops algorithms, protocols, analyses, and open-source software, along with comprehensive validation in the context of a diverse set of applications.
The project will result in a novel framework for concurrency control in scalable distributed systems. The concurrency control service has a number of desirable features: (i) modularity -- the service can be instantiated at runtime, with minimal change to underlying data storage organization and access mechanisms; (ii) extensibility ? the service adapts dynamically to load and service requirements; and (iii) high performance through the use of efficient algorithms exploiting data and lock access patterns. These features are achieved through a novel mix of algorithms for lock migration and collocation, statistical models for dynamic lock and data access, protocols for lock state management, associated proofs of correctness and fairness, fault tolerance, performance, and scalability. The concurrency control service is fully validated on private as well as public clouds on a mix of applications drawn from Online Transaction Processing and Machine Learning.
The project directly impacts an important class of cloud-based applications by providing a modular and extensible lock service. The service relieves burden on the application programmer while providing high performance and elastic throughput. Beyond this, the project includes a number of educational initiatives aimed at undergraduate and graduate education, along with outreach efforts aimed at enhancing representation of minority groups. These include development of instructional material, curricula, organization of and presentations at workshops and summer schools, and recruitment initiatives aimed at students from under-represented groups.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
A significant fraction of current high-throughput applications run in cloud environments. Efficient execution of these applications at scale is an important problem, particularly, in dynamic scalable data centers. This project resulted in efficient and effective concurrency control mechanisms for real-world cloud-based applications. It formulated suitable models for resource utilization, throughput, and makespan, developed methods for optimizing these measures, realized these methods in efficient software, incorporated the software into commonly used open source frameworks, and demonstrated the software in to context of an important application -- manipulation of large graph databases.
The project resulted in a number of technical insights: (i) it resulted in methods for scheduling concurrent operations on key-value stores (such as those used in Facebook, for instance) to maximize their performance; (ii) it resulted in runtime systems for scheduling stream processing applications (such as those in Spark) that mitigate the impact of stragglers on overall system efficiency; (iii) it resulted in the development of scheduling techniques and resource managers, capable of opportunistically utilizing applicated-but-unused resources to significantly enhance overall cloud resource utilization; and (iv) in development of a complete graph database system, TruenoDB, which demonstrates various elements of the project in a single important application.
The project outcomes have broad and significant impact. Cloud platforms are the dominant cost and energy drivers in current compute deployments. Our work has shown that the resource utilization of current data centers can be improved by over 30% in real-world environments. This is a very significant improvement in terms of energy utilization, cost, and application measures of turn-around time. To this end, the project has direct impact on a large application base. The demonstrator application, TruenoDB graph database, can itself be used in different domains. We have demonstrated its use in systems biology (for modeling biochemical pathways) and path-planning in reaction networks for manufacturing pharmaceutical compounds.
Finally, the project directly funded the graduate education of a Hispanic student, and enabled the PI to develop significant online material in Parallel Computing.
Last Modified: 12/10/2020
Modified by: Ananth Grama
Please report errors in award information by writing to: awardsearch@nsf.gov.