
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 28, 2016 |
Latest Amendment Date: | July 28, 2016 |
Award Number: | 1629126 |
Award Instrument: | Standard Grant |
Program Manager: |
Anindya Banerjee
abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $343,904.00 |
Total Awarded Amount to Date: | $343,904.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1960 KENNY RD COLUMBUS OH US 43210-1016 (614)688-8735 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1960 Kenny Road Columbus OH US 43212-1307 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Exploiting Parallel&Scalabilty |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Despite decades of progress, writing correct parallel software to realize the value of modern parallel computer hardware remains extremely difficult. A key problem is that today's computer systems do not give all programs clear behavioral guarantees; "ill-synchronized" code, in which parallel computations are incompletely or incorrectly coordinated, has ill-defined, often destructive behavior. This problem is a key theoretical and practical flaw in nearly all parallel computer systems. This proposal addresses this challenge, by proposing a new class of parallel computer architectures with strong behavioral guarantees, even for ill-synchronized code. The key idea is to make systems safely terminate ill-synchronized program executions before they can cause problems. To avoid degrading availability, the project includes mechanisms to avoid terminating program executions when possible, by falling back to more permissive, yet safe and predictable behavioral guarantees, and by resolving potential errors caused by ill-synchronized code. The intellectual merits of the project are that it provides crucial behavioral guarantees even to ill-synchronized parallel code. The project eliminates outdated hardware models that not only provide inadequate behavioral guarantees, but are also complex, and power-hungry. The project is the first in this domain to directly address availability and correctness together. The project's broader significance and importance are that it will improve the reliability of all parallel systems, which affects all aspects of life: medicine, energy, transportation, health, defense, and business. The stronger guarantees provided by this project avoid costly, dangerous failures and decrease the cost of application development, even in mature languages. The project will generate results relevant to industry and will influence academia through publication. The project will directly influence secondary and higher education in computing, fostering a diverse, future STEM workforce.
To provide strong behavioral guarantees to all code -- even if incorrectly synchronized -- the proposed architectures provide region-atomic memory consistency guarantees for coarse-grained code regions. In these architectures, a program's execution is either a serialization of code regions, or it terminates with an exception that indicates an error could have left memory inconsistent. The architectures provide this strong memory consistency model to all program executions, departing from mainstream approaches to coherence and consistency that favor weaker guarantees without a clear benefit in complexity or performance. In systems executing ill-synchronized code, frequent exceptions may too often terminate program executions, degrading availability. The proposed architectures avoid degrading availability by tolerating consistency violations with a well-defined snapshot isolation semantics that avoids exceptions, but does not guarantee serializability of code regions. The architectures further address availability by resolving exceptions, leveraging commutativity of code to avoid unnecessary exceptions for commutative operations, as well as using dynamic symbolic analysis to resolve exceptions by combining symbolic memory updates.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
In response to physical limitations, modern computer systems provide increasingly parallel (instead of sequential) resources, but making use of these resources correctly and efficiently is notoriously difficult and error prone. As a result, computer systems are unreliable and sometimes unavailable, and are difficult to program. The project focused specifically on how parallel resources in computer systems communicate with each other. It showed both how to relax the requirements for communication -- saving execution time and energy -- while also automatically providing correctness guarantees for programs written to use relaxed communication. The project personnel designed, implemented, and evaluated new computer processor designs that provided these reliability and cost-saving benefits, demonstrating their benefits over the previous state of the art.
The project's intellectual merit lies in its novel solutions to important technical problems. The project introduced a computer chip design that provided strong guarantees for computer software regardless of its communication patterns, but with lower execution time and energy usage than was previously known to be possible. It showed, for the first time, how to automatically and correctly avoid errors that can occur during execution as a result of providing stronger guarantees. The project demonstrated a novel approach for relaxing communication in computer systems while preserving system correctness.
The project's broader impacts include societal benefits, publications, implementations, education, mentoring, and outreach aimed at broadening participation in computer science. Computer systems affect virtually all aspects of society; improving systems' reliability and performance reduces labor and energy costs, and improves human well-being and safety by improving safety-critical systems and impacting areas such as medical technology. The project's results were published in widely read, peer-reviewed proceedings. The project personnel made all of their software and hardware implementations publicly available for other researchers to inspect and build upon. The project helped train undergraduate and graduate students in the project's technical areas, through undergraduate and graduate course material developed by the principal investigator (PI) and PhD students advised by the PI. The project enabled the PI to start and lead an organization aimed at introducing undergraduates -- especially undergraduates who are members of underrepresented groups -- to computer science research.
Last Modified: 12/21/2021
Modified by: Michael Bond
Please report errors in award information by writing to: awardsearch@nsf.gov.