
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | March 8, 2010 |
Latest Amendment Date: | July 22, 2013 |
Award Number: | 0953478 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sol Greenspan
sgreensp@nsf.gov (703)292-7841 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | March 15, 2010 |
End Date: | February 29, 2016 (Estimated) |
Total Intended Award Amount: | $499,990.00 |
Total Awarded Amount to Date: | $499,990.00 |
Funds Obligated to Date: |
FY 2011 = $106,347.00 FY 2012 = $112,690.00 FY 2013 = $205,652.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
21 N PARK ST STE 6301 MADISON WI US 53715-1218 (608)262-3822 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
21 N PARK ST STE 6301 MADISON WI US 53715-1218 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Software & Hardware Foundation, SOFTWARE ENG & FORMAL METHODS |
Primary Program Source: |
01001112DB NSF RESEARCH & RELATED ACTIVIT 01001213DB NSF RESEARCH & RELATED ACTIVIT 01001314DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Computer technology is rapidly permeating all spheres of society. A computer system that affects the lives of thousands or millions of people creates a massive community of users who have an interest in the correct behavior of that system. Widespread interconnectivity means that we now have the ability to tap this potential.
This work confronts the challenge of diagnosing and mitigating concurrency bugs. A suite of novel instrumentation schemes will be developed for monitoring thread interleaving patterns. Coupled with statistical debugging models developed previously, this lets developers identify bad thread interleavings which constitute root causes of program failure. A new approach to coordinated cross-thread random sampling keeps overheads low while still providing ample data for diagnosis. Static analysis will play a role to further reduce instrumentation load. Prior statistical debugging work was content with diagnosis only, but this project will develop a speculative locking strategy, guided by the statistical models, to avoid and thereby mitigate the effects of a variety of concurrency bugs.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Computer technology is rapidly permeating all spheres of society. A computer system that affects the lives of thousands or millions of people creates a massive community of users who have an interest in the correct behavior of that system. Widespread network connectivity means that we now have the ability to tap this potential. The work supported by this grant explored how to harness the power of large user communities to diagnose a particularly pernicious class of computer defects: software concurrency bugs. These elusive bugs that arise when a computer system is performing multiple tasks simultaneously, and some of those tasks interfere with each other in destructive, unpredictable ways.
To address these problems, we developed Cooperative Crug Isolation (CCI), a low-overhead instrumentation framework to diagnose production-run failures caused by concurrency bugs (crugs). CCI tracks specific thread interleavings at run-time, and uses statistical models to identify strong failure predictors among these. We offer a varied suite of predicates that represent different trade-offs between complexity and fault isolation capability. We also developed variant random sampling strategies that suit different types of predicates and help keep the run-time overhead low. Experiments show that these schemes span a wide spectrum of performance and diagnosis capabilities, each suitable for different usage scenarios.
However, recognizing that a bug exists is only half of the battle. Someone still needs to fix the problem’s root cause. This debugging task is difficult in general; it is especially slow and error-prone for concurrency bugs. With NSF’s generous support, we created CFix, a system that automates the repair of concurrency bugs. CFix works with a wide variety of concurrency-bug detectors, including (but definitely not limited to) our own CCI system. For each failure-inducing interleaving reported by a bug detector, CFix first determines a combination of mutual-exclusion and order relationships that, once enforced, can prevent the buggy interleaving. CFix then uses static analysis and testing to determine where to insert what synchronization operations to force the desired mutual-exclusion and order relationships, with a best effort to avoid deadlocks and excessive performance losses. CFix also simplifies its own patches by merging fixes for related bugs. Evaluation using four different types of bug detectors and thirteen real-world concurrency-bug cases shows that CFix can successfully patch these cases without causing deadlocks or excessive performance degradation. Patches automatically generated by CFix are of similar quality to those manually written by developers. An award nomination for part of this work cited it as “one of the first papers to attack the problem of automated bug fixing” of any kind. Thus, this represents a major step forward not only for concurrent software but also for reliable computing in general.
Concurrency is the future: of this there is no doubt. Our ability to maintain software quality in that concurrent future is, however, very much in doubt. The work sponsored by this research grant represents several major steps forward toward safeguarding that future.
Last Modified: 05/26/2016
Modified by: Benjamin R Liblit
Please report errors in award information by writing to: awardsearch@nsf.gov.