NSF Award Search: Award # 1840934

Award Abstract # 1840934

SHF: Small: Collaborative Research: Static Analysis Infrastructure for Variability-Aware Bug Detection and Translation of Highly-Configurable Software Systems

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	THE UNIVERSITY OF CENTRAL FLORIDA BOARD OF TRUSTEES
Initial Amendment Date:	July 5, 2018
Latest Amendment Date:	June 2, 2020
Award Number:	1840934
Award Instrument:	Standard Grant
Program Manager:	Sol Greenspan CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2018
End Date:	September 30, 2022 (Estimated)
Total Intended Award Amount:	$229,087.00
Total Awarded Amount to Date:	$253,087.00
Funds Obligated to Date:	FY 2018 = $229,087.00 FY 2019 = $16,000.00 FY 2020 = $8,000.00
History of Investigator:	Paul Gazzillo (Principal Investigator) paul.gazzillo@ucf.edu
Recipient Sponsored Research Office:	The University of Central Florida Board of Trustees 4000 CENTRAL FLORIDA BLVD ORLANDO FL US 32816-8005 (407)823-0387
Sponsor Congressional District:	10
Primary Place of Performance:	University of Central Florida Orlando FL US 32816-8005
Primary Place of Performance Congressional District:	10
Unique Entity Identifier (UEI):	RD7MXJV7DKT9
Parent UEI:
NSF Program(s):	Software & Hardware Foundation
Primary Program Source:	01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7923, 7944, 9251
Program Element Code(s):	779800
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Highly-configurable systems, e.g., the Linux kernel, form our most critical infrastructure, underpinning everything from high-performance computing clusters to IoT devices. Keeping these systems secure and reliable with automated tools is essential. However, tool support is lacking for such systems because of the complexity and scale of their configurability. This leaves some of the most critical software with some of the least tool support. The problem is that most software tools are not variability-aware; that is, they do not account for the many configurations of the software. Serious defects, including null pointer errors and buffer overflows, can and do appear in specific configurations, making them hard to find without accounting for variability. The goal of this project is to advance the state of the art for systems development and debugging, resulting in more secure and less error-prone systems, benefiting the millions who rely on highly-configurable software infrastructure.

To solve these challenges, this project aims to develop the infrastructure, analysis techniques, and language support for debugging and maintaining configurable software systems written in C-family languages, currently lacking for software developers. The first part of the project is to develop a front-end infrastructure that captures these sources of variability in a new intermediate representation. Such reusable infrastructure is crucial to the development of state-of-the-art analyses. The second part seeks to create variability-aware versions of static analyses and propose new inter-procedural analyses that enable tradeoffs between scalability and precision. While static analysis has proven useful for detecting bugs, accounting for configurations increases the complexity of analysis. Systematic extensions to bug detection algorithms based on these new analyses can target previously obscured bugs. Since the C preprocessor has long been recognized as a source of problems, the third part of this project is to develop new language extensions to C, supplanting preprocessor usage and enabling compiler support for variability specifications. Translators to the new language based on our front-end analysis infrastructure will enable existing software to benefit from the new language. The PIs on this project will mentor graduate students and are committed to promoting female and under-represented minority participation. Artifacts developed in this project will be used in courses to introduce students to state-of-the-art software tool development.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gazzillo, Paul and Wei, Shiyi "Conditional compilation is dead, long live conditional compilation!" Proceedings - International Conference on Software Engineering , 2019 10.1109/ICSE-NIER.2019.00035 Citation Details

Mordahl, Austin "Toward detection and characterization of variability bugs in configurable C software: an empirical study" Proceedings - International Conference on Software Engineering , v.Compani , 2019 10.1109/ICSE-Companion.2019.00064 Citation Details

Mordahl, Austin and Oh, Jeho and Koc, Ugur and Wei, Shiyi and Gazzillo, Paul "An Empirical Study of Real-World Variability Bugs Detected by Variability-Oblivious Tools" ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2019 10.1145/3338906.3338967 Citation Details

Oh, Jeho and Gazzillo, Paul and Batory, Don "t-wise Coverage by Uniform Sampling" SPLC '19: Proceedings of the 23rd International Systems and Software Product Line Conference , v.A , 2019 10.1145/3336294.3342359 Citation Details

Oh, Jeho and Yldran, Necip Fazl and Braha, Julian and Gazzillo, Paul "Finding broken Linux configuration specifications by statically analyzing the Kconfig language" ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2021 https://doi.org/10.1145/3468264.3468578 Citation Details

Patterson, Zachary and Zhang, Zenong and Pappas, Brent and Wei, Shiyi and Gazzillo, Paul "SugarC: scalable desugaring of real-world preprocessor usage into pure C" ICSE '22: Proceedings of the 44th International Conference on Software Engineering , 2022 https://doi.org/10.1145/3510003.3512763 Citation Details

Schubert, Philipp Dominik and Gazzillo, Paul and Patterson, Zach and Braha, Julian and Schiebel, Fabian and Hermann, Ben and Wei, Shiyi and Bodden, Eric "Static data-flow analysis for software product lines in C: Revoking the preprocessors special role" Automated Software Engineering , v.29 , 2022 https://doi.org/10.1007/s10515-022-00333-1 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Tool support is lacking for highly-configurable systems, such as the Linux kernel, because of the complexity and scale of their configurability. The problem is that most software tools do not account for the many configurations of the software, i.e., are variability-oblivious. However, bugs such as null pointer errors and buffer overflows can appear in arbitrary configurations, making them hard to find with existing tools. The main goal of this project is to advance the state-of-the-art in program analysis for highly-configurable C systems software. The project includes the development of an analysis infrastructure that enables variability bug-finding. Achieving this requires overcoming the precision and scalability challenges to the underlying analysis algorithms.

We introduced a new framework for generating benchmarks of variability-aware bugs. This framework simulates variability-aware analysis using off-the-shelf bug detectors by running them on samples of configurations. Unlike prior techniques for developing variability bug datasets, our approach finds bugs known to be discoverable by state-of-the-art bug finding tools. Therefore, the results are applicable to evaluating the variability-aware analysis that we developed under this project.

For the front-end of our variability-aware analysis development, we designed and implemented a scalable desugaring transformation, SugarC, that translates unpreprocessed C to pure C. This closes the gap between existing variability-oblivious and variability-aware analyses by converting configurable C code into pure C. The variability remains encoded in C, which can be analyzed by existing analyses. To evaluate support for desugaring C constructs, we created a new benchmark called DesugarBench, showing that SugarC supports many more constructs than prior works, especially the kinds of challenging cases found in real-world C.

We developed two parallel efforts for exploring variability-aware analysis on top of the front-end. First, VarAlyzer is an end-to-end variability-aware dataflow analysis. VarAlyzer was evaluated by conducting a typestate analysis that checks for correct API usage. Second, Sugarlyzer is an extensible framework that enables the integration of many existing variability-oblivious tools. To demonstrate the extensibility of Sugarlyzer, we integrate three popular static analyzers (Clang, Infer, and Phasar) into Sugarlyzer. The integration only requires dozens of lines of code to implement. We have run all three integrated tools on a variability bug dataset, VBDb, in order to assess Sugarzlyer?s correctness and effectiveness. The results show that Sugarlyzer is able to detect the vast majority of variability bugs present in the dataset (78/105), compared to a baseline that exhaustively tests all configurations.

Our analysis of macro usage has yielded formal properties describing the transformability of macro usage. Specifically, it categorizes macros by their semantic equivalence to C function. We used these properties to determine what macros are transformable without any change to the interface of the macro. Furthermore, we implemented these properties in a lightweight static analysis which informs our transformer that rewrites equivalent macros to C function, thereby removing the preprocessor usage.

We used our build system constraint extraction algorithms to generate valid build configuration for the Linux kernel. This process found build errors, which we patched and reported to the Linux developers. We have released a new version of the constraint analysis tool publicly and had one accepted build system patch to the Linux kernel source.

We used ConfigFuzz to transform six common fuzzing targets and carried out the evaluation using the AFL and AFL++ fuzzers. ConfigFuzz shows better performance than two baseline setups in four targets, while on the other two targets, ConfigFuzz does not always outperform the baselines. We analyze the target programs? source code and the options fuzzed by ConfigFuzz to reason about the fuzzing performance. We also show that parameterizing ConfigFuzz to fuzz configurations with up to 2 options often leads to higher code coverage than up to 1 option, while fuzzing many more options with ConfigFuzz may decrease the performance.

The research results from this grant were disseminated to several competitive conferences and journals including, but not limited to, the International Conference of Software Engineering (ICSE), Transactions on Software Engineering and Methodology (TOSEM), and the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). Nine peer-review publications were produced that were funded in part by this grant. Artifacts produced during this grant have been disseminated publicly in repositories containing both the software source code, experimental scripts, and resulting data.

Ten graduate and six undergraduate students were funded in part from this research, including five from groups underrepresented in computing. Two of the graduate students were doctoral students who graduated with dissertation work funded in part by the grant and one masters student graduated during the grant, all of whom now work in the software industry. Five graduate courses received content based on grant research, including a graduate course on configurable software, an independent study on configurable software, and courses on operating systems and compilers that incorporate material related to the grant.

Last Modified: 01/15/2023
Modified by: Paul Gazzillo

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error