Award Abstract # 2016465
CCRI: ENS: Collaborative Research: Enabling Automated Language Support for the srcML Infrastructure

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: KENT STATE UNIVERSITY
Initial Amendment Date: July 9, 2020
Latest Amendment Date: April 18, 2025
Award Number: 2016465
Award Instrument: Standard Grant
Program Manager: Almadena Chtchelkanova
achtchel@nsf.gov
 (703)292-7498
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 15, 2020
End Date: June 30, 2026 (Estimated)
Total Intended Award Amount: $397,849.00
Total Awarded Amount to Date: $417,849.00
Funds Obligated to Date: FY 2020 = $397,849.00
FY 2021 = $10,000.00

FY 2022 = $10,000.00
History of Investigator:
  • Jonathan Maletic (Principal Investigator)
    jmaletic@kent.edu
Recipient Sponsored Research Office: Kent State University
1500 HORNING RD
KENT
OH  US  44242-0001
(330)672-2070
Sponsor Congressional District: 14
Primary Place of Performance: Kent State University
Department of Computer Science,
Kent
OH  US  44242-0001
Primary Place of Performance
Congressional District:
14
Unique Entity Identifier (UEI): KXNVA7JCC5K6
Parent UEI:
NSF Program(s): Special Projects - CNS,
CCRI-CISE Cmnty Rsrch Infrstrc
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
01002122DB NSF RESEARCH & RELATED ACTIVIT

01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9251, 7359, 9150
Program Element Code(s): 171400, 735900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

srcML is an infrastructure for the exploration, analysis, and manipulation of source code. The infrastructure currently supports the translation of C, C++, C#, and Java source code to the srcML format. The srcML format contains all the original source code plus grammatical information from the specific programming language used. The self-contained parsing technology is very robust and highly scalable both in time and memory. Researchers and practitioners are able to construct source code analysis tools very easily by using the infrastructure. srcML has been leveraged to construct tools for such things as software quality assessment, error detection, and security risk assessment of software systems. The freely available srcML parser is used by a wide variety of researchers and practitioners in the fields of software engineering and programming languages, as well as computer science education. srcML has been used in the dissertation/thesis research of dozens (and counting) computer science graduate students at a number of institutions across the country. The proposed enhancements to the srcML infrastructure will extend the parsing and markup to a broad variety of widely used programming languages for example, Python, JavaScript, Go, Ruby, etc. The proposed enhancement to the srcML infrastructure will drastically reduce the entry cost for individuals to conduct research by enabling them to explore, analyze, and manipulate software in an easy and flexible manner, thus allowing them more time to pursue novel and transformative research on software, software engineering, and programming languages. Furthermore, it provides practical tools for engineers to improve the quality and lower the cost of software applications we all use daily.

The proposed enhancements to the srcML infrastructure extend it to a wider variety of popular programming languages. These extensions will be accomplished by developing a parser generator for the srcML format. The input is a programming language grammar, and the output is a parser that takes source code in that programming language and inserts the srcML markup into the code. This basic approach is similar to those taken by parser generators such as yacc or ANTLR. This grammar-based approach will significantly broaden the audience for the srcML infrastructure. It will not only allow for new languages to be easily added but also the ability to support dialects, legacy languages, and domain-specific languages. Many current research tools and techniques do not work on mixed/multi-language systems or are not validated on such real-world systems due to the lack of tools that can be applied. The enhancement to srcML will represent one of the only mixed language, source code analysis tool that is open source and freely available.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Al-Ramadan, A and Behler, JAC and Decker, M and Dragan, N and Collard, ML and Maletic, JM "Stereocode: A Tool for Automatic Identification of Method and Class Stereotypes for Software Systems" , 2024 Citation Details
Behler, JAC and Al-Ramadan, AF and Baheri, B and Guan, Q and Maletic, JI "Supporting Program Analysis and Transformation of Quantum-Based Languages" , 2024 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page