
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | April 19, 2013 |
Latest Amendment Date: | June 1, 2016 |
Award Number: | 1302169 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sol Greenspan
sgreensp@nsf.gov (703)292-7841 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | July 1, 2013 |
End Date: | June 30, 2017 (Estimated) |
Total Intended Award Amount: | $482,852.00 |
Total Awarded Amount to Date: | $482,852.00 |
Funds Obligated to Date: |
FY 2015 = $118,390.00 FY 2016 = $120,313.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
5700 RIVERTECH CT STE 210 RIVERDALE MD US 20737-1250 (301)314-6070 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
5825 University Research Ct. College Park MD US 20740-3823 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Software & Hardware Foundation, SOFTWARE ENG & FORMAL METHODS |
Primary Program Source: |
01001516DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The goal of the research is to enable software engineers to find software development best practices from past empirical data. The increasing availability of software development project data, plus new machine learning techniques, make it possible for researchers to study the generalizability of results across projects using the concept of transfer learning. Using data from real software projects, the project will determine and validate best practices in three areas: predicting software development effort; isolating software detects; effective code inspection practices.
This research will deliver new data mining technologies in the form of transfer learning techniques and tools that overcome current limitations in the state-of-the-art to provide accurate learning within and across projects. It will design new empirical studies, which apply transfer learning to empirical data collected from industrial software projects. It will build an on-line model analysis service, making the techniques and tools available to other researchers who are investigating validity of principles for best practice.
The broader impacts of the research will be to make empirical software engineering research results more transferable to practice, and to improve the research processes for the empirical software engineering community. By providing a means to test principles about software development, this work stands to transform empirical software engineering research and enable software managers to rely on scientifically obtained facts and conclusions rather than anecdotal evidence and one-off studies. Given the immense importance and cost of software in commercial and critical systems, the research has long-term economic impacts.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Software development is often unpredictable and imprecise for many reasons – new technologies emerge, the needs of customers change, and we are constantly building larger and more complex software systems. Unlike traditional engineering disciplines, there are no blueprints for software. Best practices and lessons learned on how to best build software vary from person to person, organization to organization, and project to project. These “rules” for building better software are often based on personal experience or anecdotes and, while useful, are rarely backed by scientific evidence or apply only in a very limited number of scenarios. For decades, researchers have attempted to derive rules for controlling the cost and quality of software, but with mixed success at best.
The NSF Transfer Learning in Software Engineering project sought to address one of the major reasons that scientific predictability of software quality and effort has eluded us: that it is difficult to identify relevant data from past software projects from which to draw rules for engineering the software system at hand. To tackle this problem, we engaged in a number of activities.
We developed the XTREE algorithm that evaluates the impact of proposed changes to software structure (i.e., refactoring) on software quality. For example, will reducing the complexity of a particular piece of code actual reduce the number of bugs in the system? These decisions are evaluated using data from the project’s history of changes. The XTREE algorithm enables software project managers to make decisions on how to allocate effort based on past performance, rather than guesswork and anecdote.
We developed the LACE2 privacy algorithm, which enables private companies and other organizations to contribute highly-detailed software development data (for example, the size of source code files and the number of defects in them) without revealing the origin of the data. By growing the amount of data available, it is more likely we can identify a past project from which to transfer rules to the current project.
We applied a tool from psychology, the repertory grid, to extract lessons learned from professional software engineers and compare them in a scientifically valid way. This technique shows promise for documenting, comparing, and quantifying lessons learned rather than relying on simple anecdotes.
We applied natural language processing techniques to pool together data from multiple NASA missions on how software bugs occur in operation. These techniques show promise for generating new sources of lessons learned from unstructured data (i.e., reports, emails) that can be used to make quantitative decisions.
Finally, we examined whether one of the most commonly-held laws of software development is, in fact, a law in the scientific sense. In a survey of several dozen software engineers, the assertion that “the longer a problem remains in the system, the more expensive it is to fix” was the most commonly-believed “law” of software engineering. Yet, in a study of several hundred software projects that employed the Team Software Process, we find that this assertion was not true.
In total, this research produced five publications authored by Fraunhofer staff and supported the research effort of three undergraduate students. This work brought together software engineering data and lessons learned from commercial companies, government organizations, and open source repositories.
This research made significant headway to addressing the problem of transferring rules for developing better software quantitatively from project to project. There is still much work to be done to tame the complexities of software development introduced by variations in individuals, organizations, and their processes.
Last Modified: 06/30/2017
Modified by: Lucas Layman
Please report errors in award information by writing to: awardsearch@nsf.gov.