NSF Award Search: Award # 1302169

Award Abstract # 1302169

SHF: MEDIUM: Collaborative Research: Transfer Learning in Software Engineering

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	FRAUNHOFER USA, INC
Initial Amendment Date:	April 19, 2013
Latest Amendment Date:	June 1, 2016
Award Number:	1302169
Award Instrument:	Continuing Grant
Program Manager:	Sol Greenspan sgreensp@nsf.gov (703)292-7841 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	July 1, 2013
End Date:	June 30, 2017 (Estimated)
Total Intended Award Amount:	$482,852.00
Total Awarded Amount to Date:	$482,852.00
Funds Obligated to Date:	FY 2013 = $244,149.00 FY 2015 = $118,390.00 FY 2016 = $120,313.00
History of Investigator:	Lucas Layman (Principal Investigator) laymanl@uncw.edu Forrest Shull (Former Principal Investigator) Lucas Layman (Former Co-Principal Investigator)
Recipient Sponsored Research Office:	Fraunhofer Center for Experimental Software Engineering 5700 RIVERTECH CT STE 210 RIVERDALE MD US 20737-1250 (301)314-6070
Sponsor Congressional District:	04
Primary Place of Performance:	Fraunhofer Center for Experimental Software Engineering 5825 University Research Ct. College Park MD US 20740-3823
Primary Place of Performance Congressional District:	04
Unique Entity Identifier (UEI):	SE22S4GCCDG3
Parent UEI:
NSF Program(s):	Software & Hardware Foundation, SOFTWARE ENG & FORMAL METHODS
Primary Program Source:	01001314DB NSF RESEARCH & RELATED ACTIVIT 01001516DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7924, 7944, 9150
Program Element Code(s):	779800, 794400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

The goal of the research is to enable software engineers to find software development best practices from past empirical data. The increasing availability of software development project data, plus new machine learning techniques, make it possible for researchers to study the generalizability of results across projects using the concept of transfer learning. Using data from real software projects, the project will determine and validate best practices in three areas: predicting software development effort; isolating software detects; effective code inspection practices.

This research will deliver new data mining technologies in the form of transfer learning techniques and tools that overcome current limitations in the state-of-the-art to provide accurate learning within and across projects. It will design new empirical studies, which apply transfer learning to empirical data collected from industrial software projects. It will build an on-line model analysis service, making the techniques and tools available to other researchers who are investigating validity of principles for best practice.

The broader impacts of the research will be to make empirical software engineering research results more transferable to practice, and to improve the research processes for the empirical software engineering community. By providing a means to test principles about software development, this work stands to transform empirical software engineering research and enable software managers to rely on scientifically obtained facts and conclusions rather than anecdotal evidence and one-off studies. Given the immense importance and cost of software in commercial and critical systems, the research has long-term economic impacts.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ekrem Kocaguneli, Tim Menzies, Emilia Mendes "Transfer learning in effort estimate" Empirical Software Engineering , 2014 10.1007/s10664-014-9300-5

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Software development is often unpredictable and imprecise for many reasons – new technologies emerge, the needs of customers change, and we are constantly building larger and more complex software systems. Unlike traditional engineering disciplines, there are no blueprints for software. Best practices and lessons learned on how to best build software vary from person to person, organization to organization, and project to project. These “rules” for building better software are often based on personal experience or anecdotes and, while useful, are rarely backed by scientific evidence or apply only in a very limited number of scenarios. For decades, researchers have attempted to derive rules for controlling the cost and quality of software, but with mixed success at best.

The NSF Transfer Learning in Software Engineering project sought to address one of the major reasons that scientific predictability of software quality and effort has eluded us: that it is difficult to identify relevant data from past software projects from which to draw rules for engineering the software system at hand. To tackle this problem, we engaged in a number of activities.

We developed the XTREE algorithm that evaluates the impact of proposed changes to software structure (i.e., refactoring) on software quality. For example, will reducing the complexity of a particular piece of code actual reduce the number of bugs in the system? These decisions are evaluated using data from the project’s history of changes. The XTREE algorithm enables software project managers to make decisions on how to allocate effort based on past performance, rather than guesswork and anecdote.

We developed the LACE2 privacy algorithm, which enables private companies and other organizations to contribute highly-detailed software development data (for example, the size of source code files and the number of defects in them) without revealing the origin of the data. By growing the amount of data available, it is more likely we can identify a past project from which to transfer rules to the current project.

We applied a tool from psychology, the repertory grid, to extract lessons learned from professional software engineers and compare them in a scientifically valid way. This technique shows promise for documenting, comparing, and quantifying lessons learned rather than relying on simple anecdotes.

We applied natural language processing techniques to pool together data from multiple NASA missions on how software bugs occur in operation. These techniques show promise for generating new sources of lessons learned from unstructured data (i.e., reports, emails) that can be used to make quantitative decisions.

Finally, we examined whether one of the most commonly-held laws of software development is, in fact, a law in the scientific sense. In a survey of several dozen software engineers, the assertion that “the longer a problem remains in the system, the more expensive it is to fix” was the most commonly-believed “law” of software engineering. Yet, in a study of several hundred software projects that employed the Team Software Process, we find that this assertion was not true.

In total, this research produced five publications authored by Fraunhofer staff and supported the research effort of three undergraduate students. This work brought together software engineering data and lessons learned from commercial companies, government organizations, and open source repositories.

This research made significant headway to addressing the problem of transferring rules for developing better software quantitatively from project to project. There is still much work to be done to tame the complexities of software development introduced by variations in individuals, organizations, and their processes.

Last Modified: 06/30/2017
Modified by: Lucas Layman

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error