NSF Award Search: Award # 1533795

Award Abstract # 1533795

XPS: EXPL: SDA: Scalable Concurrency Control Techniques for Distributed Systems

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	PURDUE UNIVERSITY
Initial Amendment Date:	August 14, 2015
Latest Amendment Date:	August 14, 2015
Award Number:	1533795
Award Instrument:	Standard Grant
Program Manager:	Marilyn McClure mmcclure@nsf.gov (703)292-5197 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2015
End Date:	August 31, 2020 (Estimated)
Total Intended Award Amount:	$299,990.00
Total Awarded Amount to Date:	$299,990.00
Funds Obligated to Date:	FY 2015 = $299,990.00
History of Investigator:	Ananth Grama (Principal Investigator)
Recipient Sponsored Research Office:	Purdue University 2550 NORTHWESTERN AVE # 1100 WEST LAFAYETTE IN US 47906-1332 (765)494-1055
Sponsor Congressional District:	04
Primary Place of Performance:	Purdue University 305 N. University Street West Lafayette IN US 47907-2107
Primary Place of Performance Congressional District:	04
Unique Entity Identifier (UEI):	YRXVL4JYCEF5
Parent UEI:	YRXVL4JYCEF5
NSF Program(s):	Exploiting Parallel&Scalabilty
Primary Program Source:	01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s):	828300
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Virtually all distributed computing applications, from transactions on databases to updates on social media platforms, involve concurrent operations on data objects. For these applications, concurrency control mechanisms represent significant performance overheads. These applications typically exhibit strong and persistent patterns in data access. Motivated by the importance of the problem, this project investigates the use of dynamic data- and lock-access patterns in distributed computations to significantly improve the performance of concurrency control mechanisms for scalable systems, specifically, in conventional cloud environments and key-value stores such as BigTable, HBase, and Cassandra. In contrast to conventional techniques that collocate locks with corresponding data items, this project relies on a modular lock service that decouples lock locations from corresponding data objects, and maintains lock state of all data items in a small set of storage nodes.

This design choice motivates a number of questions for this research: (i) where and when should lock states be migrated into the lock service? (ii) when should lock state be repatriated to the data store? (iii) how should the lock service be scaled out? (iv) what are fault-tolerant, low-overhead, deadlock- and livelock-free protocols for these operations? and (v) how can long-lived data access patterns be leveraged in such systems? Building on preliminary results that demonstrate the feasibility and considerable promise of the approach, the project develops algorithms, protocols, analyses, and open-source software, along with comprehensive validation in the context of a diverse set of applications.

The project will result in a novel framework for concurrency control in scalable distributed systems. The concurrency control service has a number of desirable features: (i) modularity -- the service can be instantiated at runtime, with minimal change to underlying data storage organization and access mechanisms; (ii) extensibility ? the service adapts dynamically to load and service requirements; and (iii) high performance through the use of efficient algorithms exploiting data and lock access patterns. These features are achieved through a novel mix of algorithms for lock migration and collocation, statistical models for dynamic lock and data access, protocols for lock state management, associated proofs of correctness and fairness, fault tolerance, performance, and scalability. The concurrency control service is fully validated on private as well as public clouds on a mix of applications drawn from Online Transaction Processing and Machine Learning.

The project directly impacts an important class of cloud-based applications by providing a modular and extensible lock service. The service relieves burden on the application programmer while providing high performance and elastic throughput. Beyond this, the project includes a number of educational initiatives aimed at undergraduate and graduate education, along with outreach efforts aimed at enhancing representation of minority groups. These include development of instructional material, curricula, organization of and presentations at workshops and summer schools, and recruitment initiatives aimed at students from under-represented groups.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 14)

Show All

Ashraf Mahgoub, Paul Wood, Sachandhan Ganesh, Subrata Mitra, Wolfgang Gerlach, Travis Harrison, Folker Meyer, Ananth Grama, Saurabh Bagchi, and Somali Chaterji "Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads" Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference , 2017

Karthik Kambatla, Vamsee Yarlagadda, Ganesh Ananthanarayanan, andAnanth Grama "UBIS: Utilization-aware cluster scheduling" International Conference on Parallel and Distributed Systems , 2018

Karthik Kambatla, Vamsee Yarlagadda, Inigo Goiri, and Ananth Grama "Optimistic scheduling with service guarantees" Journal of Parallel and Distributed Computing , 2019

K Kambatla, V Yarlagadda, Í Goiri, A Grama "Optimistic scheduling with service guarantees" Journal of Parallel and Distributed Computing , v.135 , 2020

K Kambatla, V Yarlagadda, Í Goiri, A Grama "Ubis: Utilization-aware cluster scheduling" IEEE International Parallel and Distributed Processing Symposium , 2018

Mustafa Coskun, Ananth Grama, and Mehmet Koyuturk "Indexed Fast Network Proximity Querying" VLDB , 2018

Mustafa Coskun, Ananth Grama, Mehmet Koyuturk "Effcient Processing of Network Proximity Queries via Chebyshev Acceleration" ACM Conference on Knowledge Discovery and Data Mining (SIGKDD) , 2016

Naresh Rapolu, Srimat Chakradhar, and Ananth Grama, "VAYU: Accelerating Stream Processing Applications Through DynamicNetwork-Aware Topology Re-Optimization" Journal of Parallel and Distributed Computing , 2017

N Rapolu, S Chakradhar, A Grama "VAYU: Accelerating stream processing applications through dynamic network-aware topology re-optimization" Journal of Parallel and Distributed Computing , v.111 , 2018

N Rapolu, S Chakradhar, A Grama "VAYU: Accelerating stream processing applications through dynamic network-aware topology re-optimization" Journal of Parallel and Distributed Computing , 2018

(Showing: 1 - 10 of 14)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

A significant fraction of current high-throughput applications run in cloud environments. Efficient execution of these applications at scale is an important problem, particularly, in dynamic scalable data centers. This project resulted in efficient and effective concurrency control mechanisms for real-world cloud-based applications. It formulated suitable models for resource utilization, throughput, and makespan, developed methods for optimizing these measures, realized these methods in efficient software, incorporated the software into commonly used open source frameworks, and demonstrated the software in to context of an important application -- manipulation of large graph databases.

The project resulted in a number of technical insights: (i) it resulted in methods for scheduling concurrent operations on key-value stores (such as those used in Facebook, for instance) to maximize their performance; (ii) it resulted in runtime systems for scheduling stream processing applications (such as those in Spark) that mitigate the impact of stragglers on overall system efficiency; (iii) it resulted in the development of scheduling techniques and resource managers, capable of opportunistically utilizing applicated-but-unused resources to significantly enhance overall cloud resource utilization; and (iv) in development of a complete graph database system, TruenoDB, which demonstrates various elements of the project in a single important application.

The project outcomes have broad and significant impact. Cloud platforms are the dominant cost and energy drivers in current compute deployments. Our work has shown that the resource utilization of current data centers can be improved by over 30% in real-world environments. This is a very significant improvement in terms of energy utilization, cost, and application measures of turn-around time. To this end, the project has direct impact on a large application base. The demonstrator application, TruenoDB graph database, can itself be used in different domains. We have demonstrated its use in systems biology (for modeling biochemical pathways) and path-planning in reaction networks for manufacturing pharmaceutical compounds.

Finally, the project directly funded the graduate education of a Hispanic student, and enabled the PI to develop significant online material in Parallel Computing.

Last Modified: 12/10/2020
Modified by: Ananth Grama

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error