Award Abstract # 1510329
CI-EN: Collaborative Research: TraceLab Community Infrastructure for Replication, Collaboration, and Innovation

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: UNIVERSITY OF NOTRE DAME DU LAC
Initial Amendment Date: June 2, 2015
Latest Amendment Date: June 2, 2015
Award Number: 1510329
Award Instrument: Standard Grant
Program Manager: Sol Greenspan
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: June 1, 2015
End Date: December 31, 2018 (Estimated)
Total Intended Award Amount: $100,000.00
Total Awarded Amount to Date: $100,000.00
Funds Obligated to Date: FY 2015 = $100,000.00
History of Investigator:
  • Collin McMillan (Principal Investigator)
    collin.mcmillan@nd.edu
Recipient Sponsored Research Office: University of Notre Dame
940 GRACE HALL
NOTRE DAME
IN  US  46556-5708
(574)631-7432
Sponsor Congressional District: 02
Primary Place of Performance: University of Notre Dame
IN  US  46556-5708
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): FPU6XGFXMBE9
Parent UEI: FPU6XGFXMBE9
NSF Program(s): CCRI-CISE Cmnty Rsrch Infrstrc
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7359, 9102
Program Element Code(s): 735900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This is the second phase of the TraceLab project, which was initially funded under an NSF Major Research Instrumentation grant. The goal of this project is to deliver an instrument that facilitates reproducibility of Software Engineering experiments, fosters comparative evaluation, and provides an environment in which components can be easily shared across research groups. The challenge of experimental reproducibility extends across almost every science and engineering discipline. Recently, a widely reported study conducted by the biotech firm "Amgen" revealed that of 53 previously published landmark papers, only six were reproducible. Recent studies have unearthed similar problems across a diverse set of Software Engineering domains, including, but not limited to, software traceability, feature location, and compiler optimization. Reproducibility is often undermined by lack of publicly available datasets, obsolete and unavailable tools, insufficient details about the experiment, and undocumented decisions about the way various metrics are computed. TraceLab provides a plug-and-play experimental environment, as well as libraries of shareable components and seminal experiments. As such, it is designed to address these problems.

TraceLab introduces a radically different way of approaching empirical software engineering research and paves the way for greater community collaboration, more rigorous evaluation of results, and an easier entry-way for new researchers. It is expected to lay the foundation for future advances in the field of empirical software engineering, accelerate and shape future research directions, support industrial pilot studies, and significantly reduce the cost and effort that oftentimes discourages new researchers from entering the field. In addition, the project will provide opportunities to a diverse group of undergraduate and graduate students, and will be used for educational purposes across various software engineering courses.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Breno Cruz, Paul Will McBurney, Collin McMillan "TraceLab Components for Reproducing Source Code Summarization Experiments" 32nd IEEE International Conference on Software Maintenance and Evolution, Artifacts , 2016
Breno Cruz, Paul W. McBurney, Collin McMillan "raceLab Components for Reproducing Source Code Summarization Experiments" Proc. of the 32nd IEEE International Conference on Software Maintenance and Evolution, Artifacts Track , 2016
Rrezarta Krasniqi, Collin McMillan "TraceLab Components for Generating Speech Act Types in Developer Question/Answer Conversations" 34th IEEE International Conference on Software Maintenance and Evolution, Artifacts , 2018
Rrezarta Krasniqi, Siyuan Jiang, Collin McMillan "TraceLab Components for Generating Extractive Summaries of User Stories" 33rd IEEE International Conference on Software Maintenance and Evolution, Artifacts , 2017

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Reproducibility in science has become a major concern for both the scientific community and public interest in general.  The concern has been well placed: if scientific findings cannot be replicated, then progress based on those findings is difficult to trust.  This project targets the problem of reproducibility in software engineering experiments.  The TraceLab infrastructure, of which this project is a part, is an experimental framework for designing, saving, and transferring knowledge about software engineering experiments among different research groups.  The infrastructure began specifically for experiments in traceability, but has had its scope widened to include many areas of studies on computer software.  The infrastructure breaks experiments into components which can be reconfigured and shared to encourage improvements.

This project created TraceLab components for replicating experiments in source code summarization.  Code summarization is the problem of automatically writing natural language descriptions of software source code.  For example, writing an English description of the behavior of a function written in C source code.  Code summarization has been a rapidly growing research area since around 2009, but progress has been stalled due to a lack of easily reproducible experiments.  Essentially the problem is that different research groups have needed to build their own experimental infrastructure, processing programs, and datasets.  This project envisions a much more efficient solution in which different groups share experimental data via a common infrastructure (TraceLab).

Specific deliveries of this project are summarized in three packages associated with published articles.  These packages have been released once per year for each year of the project.  Each includes all TraceLab components, reproducibility instructions, datasets, and dependencies for key experiments written in that year.  The public may view these packages and follow clear, step-by-step instructions for replicating the experiments.  The aim is that scientific progress would increase with the availability of these materials.


Last Modified: 01/22/2019
Modified by: Collin Mcmillan

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page