Award Abstract # 2120955
CISE Core: CCF: SHF: Small: Future-Proof Test Corpus Synthesis for Evolving Software

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: CARNEGIE MELLON UNIVERSITY
Initial Amendment Date: July 14, 2021
Latest Amendment Date: July 14, 2021
Award Number: 2120955
Award Instrument: Standard Grant
Program Manager: Andrian Marcus
amarcus@nsf.gov
 (703)292-0000
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2021
End Date: September 30, 2025 (Estimated)
Total Intended Award Amount: $546,091.00
Total Awarded Amount to Date: $546,091.00
Funds Obligated to Date: FY 2021 = $546,091.00
History of Investigator:
  • Rohan Padhye (Principal Investigator)
    rohanpadhye@cmu.edu
Recipient Sponsored Research Office: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3815
(412)268-8746
Sponsor Congressional District: 12
Primary Place of Performance: Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh
PA  US  15213-3815
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): U3NKNFLNQ613
Parent UEI: U3NKNFLNQ613
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923, 7944, 9251
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Modern software is complex and continuously evolving. For every small change to the code, there is a risk of introducing unintended consequences that can affect the software's correctness, security, and performance. To guard against such issues, known as regression bugs, developers must test their software on a diverse suite of program inputs after every code change. However, manually hand-crafting such test inputs risks missing out on important corner cases. This research is developing techniques for automatically generating test inputs that guard against future regressions. The research will focus on automatically synthesizing test inputs that are easy to maintain, quick to execute, and robust at detecting faults introduced by small code changes. The technology developed in this project is intended to help improve the reliability of critical software systems, cut down energy usage during development, and reduce technical debt. Furthermore, this research is also contributing to the investigator's ongoing efforts in developing reusable course material for undergraduate computer science education, in particular, by incorporating the automatic test-input generation technology in classroom programming assignments. The project activities themselves will also provide research experience opportunities for a diverse cohort of undergraduate students.

Randomized test-input generation techniques such as grey-box fuzzing have been successful at uncovering critical bugs and security issues in widely used software. However, conventional fuzz testing requires generating billions of test inputs using hundreds of CPU-hours in order to be effective, which is impractical for continuously validating code changes. This project shifts the focus of fuzz-testing research towards generating a reusable corpus of regression test inputs, in order to support software evolution. The research is focusing on optimizing the quality of the generated test inputs along three dimensions. First, an iterative ensemble fuzzing technique is being developed for synthesizing test inputs that are concise by construction. Second, mutation analysis is being used to guide fuzzing towards synthesizing test inputs that are robust at detecting faults due to small code changes. Third, language modeling techniques are being used to learn common patterns in human-authored test inputs. The models are being used to develop novel fuzzing algorithms that can synthesize natural-looking test inputs that easier to maintain as the software evolves. The results of this research are being disseminated in the form of open-source tools and publications that are intended to help software developers reduce maintenance costs and ultimately deploy more reliable software.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Li, Ao and Huang, Madonna and Lemieux, Caroline and Padhye, Rohan "The Havoc Paradox in Generator-Based Fuzzing (Registered Report)" , 2024 https://doi.org/10.1145/3678722.3685529 Citation Details
Kambhamettu, Rajeswari Hita and Billos, John and Oluwaseun-Apo, Tomi and Gafford, Benjamin and Padhye, Rohan and Hellendoorn, Vincent J. "On the Naturalness of Fuzzer-Generated Code" 19th International Conference on Mining Software Repositories , 2022 https://doi.org/10.1145/3524842.3527972 Citation Details
Vikram, Vasudev and Laybourn, Isabella and Li, Ao and Nair, Nicole and OBrien, Kelton and Sanna, Rafaello and Padhye, Rohan "Guiding Greybox Fuzzing with Mutation Testing" ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis , 2023 https://doi.org/10.1145/3597926.3598107 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page