
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | January 28, 2013 |
Latest Amendment Date: | December 13, 2017 |
Award Number: | 1253837 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sol Greenspan
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2013 |
End Date: | August 31, 2018 (Estimated) |
Total Intended Award Amount: | $446,010.00 |
Total Awarded Amount to Date: | $478,010.00 |
Funds Obligated to Date: |
FY 2014 = $12,000.00 FY 2015 = $88,804.00 FY 2016 = $186,177.00 FY 2017 = $12,000.00 FY 2018 = $8,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1314 S MOUNT VERNON AVE WILLIAMSBURG VA US 23185 (757)221-3965 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
VA US 23187-8795 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Software & Hardware Foundation, SOFTWARE ENG & FORMAL METHODS |
Primary Program Source: |
01001415DB NSF RESEARCH & RELATED ACTIVIT 01001516DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT 01001718DB NSF RESEARCH & RELATED ACTIVIT 01001819DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
This project proposes a novel unified model to help software developers license software and (re)use components complying with legal requirements. The solution will investigate novel combinations of information retrieval, internet-scale source code search, repository mining, and static analysis approaches to detect origins of software components. The research will also rely on a feedback-driven hybrid blending of information retrieval and machine learning techniques for identifying components' licenses with high accuracy. In addition, the proposed model will unify these building blocks for license compliance analysis and verification to reason about the given software, components, dependencies, and licenses, as well as their trustworthiness, constraints, and existing or potential legal compliance issues.
The proposed research will lead to both theoretical foundations and practical solutions for the comprehensive analysis of complex legal compliance concerns to enable lawful software development and evolution. Among the broader impacts, the project will develop educational course content, involve underrepresented student groups, and produce software tools under open source licenses, collaborating with industry to transfer technology and empirically evaluate proposed research, and conducting K-12 outreach activities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Reusing open source code is common nowadays − developers are frequently copying code fragments, files, or components from one system to another for reasons such as adding features from existing systems or fixing bugs using known and tested implementations. Some recent research developments have focused on creating solutions to support these tasks; however, they are missing one important legal dimension: license compliance, if not satisfied, can create a number of severe legal consequences. In fact, copying code without the approval of the copyright holder is generally prohibited. For example, a US court ruled that even copying 25 out of 500KLOC could be considered a copyright infringement. The problem is exacerbated by the fact that software developers frequently are not aware of the associated legal risks and their companies may not have specific policies and guidelines on this account.
Open source software (comprising all the artifacts, not only the source code) is a system that is licensed under some open source license, which makes code or components available for creating derivative works based on it. The open source licensing process is a vehicle for the licensor to grant certain rights to the licensee that would else be prohibited, such as the right to make, edit, and distribute copies. In exchange for these rights, the licensee must abide by the requirements and constraints that such licenses entail. There are numerous such Open Source Initiative approved licenses as of today, and they vary considerably in the constraints that they impose. What is even more challenging is that these licenses, their constraints and requirements, the components governed by them, and the software systems are all continually evolving. However, at any given time, any software must satisfy all the requirements for each license of the components it reuses. Another growing challenge is to ensure license compatibility of the overall system with the licenses of each of its components, libraries, and even the tools it uses. This requires a sophisticated analysis of the overall system's architecture, environment, and licenses to detect if overall licenses contradict any of its constituents’ licenses. Moreover, this analysis needs to be done continually as software, dependencies, licenses, and their requirements evolve asynchronously.
To address this issue, the project (1) defined and evaluated a unified model for establishing the provenance of components, detecting their licenses, and verifying legal licensing compliance concerns using novel combinations of Information Retrieval, Mining Software Repositories, internet-scale source code search, and static analysis approaches to detect origins of software components, (2) defined and instantiated new methodologies to support specific development and maintenance tasks, and (3) performed empirical case studies to evaluate techniques and methodologies; and (4) created and maintained a repository of software artifacts and analysis data to support rigorously controlled experimentation and benchmarking in the research community. Some of the broader impacts from this project include (1) improving the state-of-the-practice in software development that faces difficulties in ensuing legal licensing compliance challenges (2) demonstrating improved licensing practices, (3) developing educational course content and piloting it in our courses as part of this research proposal, and (4) actively involving underrepresented categories of students in this research program.
The resulting work has been published in several high-quality software engineering conferences and journals (some gaining best result recognition). A number of undergraduate and graduate students, including a minority doctoral student, were trained and became contributing members on this project. Several of these students co-authored and presented papers at international conferences. Multiple graduate-level theses were derived from this project. The students graduating from this program have secured full-time employment in academia and software industry. The gained scientific knowledge was integrated in multiple undergraduate and graduate classes at the host institutions, which broadens STEM education. ACM/IEEE Software Engineering Curriculum Guidelines identified software evolution among the ten key areas of SE education. A number of open-source software tools were developed and are made available publicly. The data repositories resulting from this project are made accessible to the scientific community and general public through the PI’s web site. The project enhanced and strengthened a long-term professional collaboration not only between the PI and his students but also multiple collaborators involved. The computing infrastructure established during the course of the project permits the sustainability of its resources.
Last Modified: 01/15/2019
Modified by: Denys Poshyvanyk
Please report errors in award information by writing to: awardsearch@nsf.gov.