NSF Award Search: Award # 1629126

Award Abstract # 1629126

XPS: FULL: Collaborative Research: Rethinking Architecture Support for Memory Consistency

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	OHIO STATE UNIVERSITY, THE
Initial Amendment Date:	July 28, 2016
Latest Amendment Date:	July 28, 2016
Award Number:	1629126
Award Instrument:	Standard Grant
Program Manager:	Anindya Banerjee abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2016
End Date:	August 31, 2021 (Estimated)
Total Intended Award Amount:	$343,904.00
Total Awarded Amount to Date:	$343,904.00
Funds Obligated to Date:	FY 2016 = $343,904.00
History of Investigator:	Michael Bond (Principal Investigator) mikebond@cse.ohio-state.edu
Recipient Sponsored Research Office:	OHIO STATE UNIVERSITY, THE 1960 KENNY RD COLUMBUS OH US 43210-1016 (614)688-8735
Sponsor Congressional District:	03
Primary Place of Performance:	Ohio State University 1960 Kenny Road Columbus OH US 43212-1307
Primary Place of Performance Congressional District:	03
Unique Entity Identifier (UEI):	DLWBSLWAJWR1
Parent UEI:	MN4MDDMN8529
NSF Program(s):	Exploiting Parallel&Scalabilty
Primary Program Source:	01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s):	828300
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Despite decades of progress, writing correct parallel software to realize the value of modern parallel computer hardware remains extremely difficult. A key problem is that today's computer systems do not give all programs clear behavioral guarantees; "ill-synchronized" code, in which parallel computations are incompletely or incorrectly coordinated, has ill-defined, often destructive behavior. This problem is a key theoretical and practical flaw in nearly all parallel computer systems. This proposal addresses this challenge, by proposing a new class of parallel computer architectures with strong behavioral guarantees, even for ill-synchronized code. The key idea is to make systems safely terminate ill-synchronized program executions before they can cause problems. To avoid degrading availability, the project includes mechanisms to avoid terminating program executions when possible, by falling back to more permissive, yet safe and predictable behavioral guarantees, and by resolving potential errors caused by ill-synchronized code. The intellectual merits of the project are that it provides crucial behavioral guarantees even to ill-synchronized parallel code. The project eliminates outdated hardware models that not only provide inadequate behavioral guarantees, but are also complex, and power-hungry. The project is the first in this domain to directly address availability and correctness together. The project's broader significance and importance are that it will improve the reliability of all parallel systems, which affects all aspects of life: medicine, energy, transportation, health, defense, and business. The stronger guarantees provided by this project avoid costly, dangerous failures and decrease the cost of application development, even in mature languages. The project will generate results relevant to industry and will influence academia through publication. The project will directly influence secondary and higher education in computing, fostering a diverse, future STEM workforce.

To provide strong behavioral guarantees to all code -- even if incorrectly synchronized -- the proposed architectures provide region-atomic memory consistency guarantees for coarse-grained code regions. In these architectures, a program's execution is either a serialization of code regions, or it terminates with an exception that indicates an error could have left memory inconsistent. The architectures provide this strong memory consistency model to all program executions, departing from mainstream approaches to coherence and consistency that favor weaker guarantees without a clear benefit in complexity or performance. In systems executing ill-synchronized code, frequent exceptions may too often terminate program executions, degrading availability. The proposed architectures avoid degrading availability by tolerating consistency violations with a well-defined snapshot isolation semantics that avoids exceptions, but does not guarantee serializability of code regions. The architectures further address availability by resolving exceptions, leveraging commutativity of code to avoid unnecessary exceptions for commutative operations, as well as using dynamic symbolic analysis to resolve exceptions by combining symbolic memory updates.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 14)

Show All

Jake Roemer, Kaan Genç, and Michael D. Bond "SmartTrack: Efficient Predictive Race Detection" ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) , 2020

Jake Roemer, Kaan Genç, and Michael D. Bond "High-Coverage, Unbounded Sound Predictive Race Detection" ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) , 2018

Aritra Sengupta, Man Cao, Michael D. Bond, and Milind Kulkarni "Legato: End-to-End Bounded Region Serializability Using Commodity Hardware Transactional Memory" International Symposium on Code Generation and Optimization (CGO) , 2017

Chenxi Wang, Haoran Ma, Shi Liu, Yuanqi Li, Zhenyuan Ruan, Khanh Nguyen, Michael D. Bond, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu "Semeru: A Memory-Disaggregated Managed Runtime" USENIX Symposium on Operating Systems Design and Implementation (OSDI) , 2020

Benjamin P. Wood Man Cao, Michael D. Bond, and Dan Grossman "Instrumentation Bias for Dynamic Data Race Detection" ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) , 2017

Swarnendu Biswas, Rui Zhang, Michael D. Bond, and Brandon Lucia "Rethinking Support for Region Conflict Exceptions" IEEE International Parallel and Distributed Processing Symposium (IPDPS) , 2019

Zixian Cai, Stephen M. Blackburn, and Michael D. Bond "Understanding and Utilizing Hardware Transactional Memory Capacity" ACM SIGPLAN International Symposium on Memory Management (ISMM) , 2021

Swarnendu Biswas, Man Cao, Minjia Zhang, Michael D. Bond, and Benjamin P. Wood "Lightweight Data Race Detection for Production Runs" International Conference on Compiler Construction (CC) , 2017

Sixiang Ma, Fang Zhou, Michael D. Bond, and Yang Wang "Finding Heterogeneous-Unsafe Configuration Parameters in Cloud Systems" European Conference on Computer Systems (EuroSys) , 2020

Rui Zhang, Swarnendu Biswas, Vignesh Balaji, Michael D. Bond, Brandon Lucia "Peacenik: Architecture Support for Not Failing under Fail-Stop Memory Consistency" ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2020 , 2020

(Showing: 1 - 10 of 14)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In response to physical limitations, modern computer systems provide increasingly parallel (instead of sequential) resources, but making use of these resources correctly and efficiently is notoriously difficult and error prone. As a result, computer systems are unreliable and sometimes unavailable, and are difficult to program. The project focused specifically on how parallel resources in computer systems communicate with each other. It showed both how to relax the requirements for communication -- saving execution time and energy -- while also automatically providing correctness guarantees for programs written to use relaxed communication. The project personnel designed, implemented, and evaluated new computer processor designs that provided these reliability and cost-saving benefits, demonstrating their benefits over the previous state of the art.

The project's intellectual merit lies in its novel solutions to important technical problems. The project introduced a computer chip design that provided strong guarantees for computer software regardless of its communication patterns, but with lower execution time and energy usage than was previously known to be possible. It showed, for the first time, how to automatically and correctly avoid errors that can occur during execution as a result of providing stronger guarantees. The project demonstrated a novel approach for relaxing communication in computer systems while preserving system correctness.

The project's broader impacts include societal benefits, publications, implementations, education, mentoring, and outreach aimed at broadening participation in computer science. Computer systems affect virtually all aspects of society; improving systems' reliability and performance reduces labor and energy costs, and improves human well-being and safety by improving safety-critical systems and impacting areas such as medical technology. The project's results were published in widely read, peer-reviewed proceedings. The project personnel made all of their software and hardware implementations publicly available for other researchers to inspect and build upon. The project helped train undergraduate and graduate students in the project's technical areas, through undergraduate and graduate course material developed by the principal investigator (PI) and PhD students advised by the PI. The project enabled the PI to start and lead an organization aimed at introducing undergraduates -- especially undergraduates who are members of underrepresented groups -- to computer science research.

Last Modified: 12/21/2021
Modified by: Michael Bond

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error