Skip to feedback

Award Abstract # 2006947
SHF: SMALL: Automated Discovery of Cross-Language Program Behavior Inconsistency

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: NORTH CAROLINA STATE UNIVERSITY
Initial Amendment Date: July 23, 2020
Latest Amendment Date: December 22, 2022
Award Number: 2006947
Award Instrument: Standard Grant
Program Manager: Sol Greenspan
sgreensp@nsf.gov
 (703)292-7841
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2020
End Date: July 31, 2025 (Estimated)
Total Intended Award Amount: $499,994.00
Total Awarded Amount to Date: $499,994.00
Funds Obligated to Date: FY 2020 = $499,994.00
History of Investigator:
  • Kathryn Stolee (Principal Investigator)
    ktstolee@ncsu.edu
  • John-Paul Ore (Co-Principal Investigator)
  • Christopher Parnin (Former Principal Investigator)
  • Kathryn Stolee (Former Co-Principal Investigator)
Recipient Sponsored Research Office: North Carolina State University
2601 WOLF VILLAGE WAY
RALEIGH
NC  US  27695-0001
(919)515-2444
Sponsor Congressional District: 02
Primary Place of Performance: North Carolina State University
Department of Computer Science,
Raleigh
NC  US  27695-8206
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): U3NVH931QJJ3
Parent UEI: U3NVH931QJJ3
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923, 7944
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

In the software industry, hundreds of programming languages exist, many of which programmers are expected to be proficient in. The common assumption has been that once a programmer knows one language, they can leverage concepts and knowledge already learned and easily pick up another programming language. Unfortunately, empirical studies find this process to be error-prone and ineffective due to imprecise mismatches between concepts and expressions across programming languages. This project develops techniques to ease the acquisition of knowledge for new programming languages by identifying and explaining how the behaviors of code in different languages relate. The anticipated result is that programmers will learn new languages faster and write code with fewer bugs. Beyond the general benefit of better-educated programmers, techniques for teaching computer programming are important in particular because programming is a crucial skill for a digitally literate society.

This project will develop techniques to automatically identify incapabilities and potential misconceptions between two programming languages. Two main research tasks will be investigated for this project. The first task is to develop an approach for automatically identifying clusters of similar code based on dynamic behavior, likely invariants, observed side effects, and performance. Behavioral clusters are formed from snippets in multiple languages that produce the same outputs on the same inputs. Likely invariants from observed behavior are used to describe similarities and differences. The second task is to develop a technique to identify misconceptions that emerge when a programmer assumes code should behave the same but it does not. To identify misconceptions, the technique leverages the behavior clusters and characterizations from the code similarity analysis; code that looks similar but behaves differently in overt (behavior) or insidious (performance, side effects) ways are candidates. The technique will rank misconceptions based on probability of appearing and likely impact. Finally, the technique will use invariants, behavior, side effects and performance to form automated explanations of behavioral similarities and differences. Finally, these techniques and explanations will be applied for the benefit of two groups of real programmers: transfer students who know one language and need to learn a new one, and data scientists who work with many programming languages to complete their tasks. For programmers learning a new language, in student, professional, or hobby capacities, this work aims to increase the speed and reliability with which they acquire knowledge of the programming language.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mathew, George and Stolee, Kathryn T. "Cross-language code search using static and dynamic analyses" Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2021 https://doi.org/10.1145/3468264.3468538 Citation Details
Middleton, Justin and Ore, John-Paul and Stolee, Kathryn T "Barriers for Students During Code Change Comprehension" , 2024 https://doi.org/10.1145/3597503.3639227 Citation Details
Middleton, Justin and Stolee, Kathryn T. "Understanding Similar Code through Comparative Comprehension" 2022 IEEE Symposium on Visual Languages and Human-Centric Computing , 2022 https://doi.org/10.1109/VL/HCC53370.2022.9833117 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page