Award Abstract # 1253837
CAREER: Enabling License Compliance Analysis and Verification for Evolving Software

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: COLLEGE OF WILLIAM AND MARY
Initial Amendment Date: January 28, 2013
Latest Amendment Date: December 13, 2017
Award Number: 1253837
Award Instrument: Continuing Grant
Program Manager: Sol Greenspan
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2013
End Date: August 31, 2018 (Estimated)
Total Intended Award Amount: $446,010.00
Total Awarded Amount to Date: $478,010.00
Funds Obligated to Date: FY 2013 = $171,029.00
FY 2014 = $12,000.00

FY 2015 = $88,804.00

FY 2016 = $186,177.00

FY 2017 = $12,000.00

FY 2018 = $8,000.00
History of Investigator:
  • Denys Poshyvanyk (Principal Investigator)
    dposhyvanyk@wm.edu
Recipient Sponsored Research Office: College of William and Mary
1314 S MOUNT VERNON AVE
WILLIAMSBURG
VA  US  23185
(757)221-3965
Sponsor Congressional District: 01
Primary Place of Performance: College of William and Mary
VA  US  23187-8795
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): EVWJPCY6AD97
Parent UEI: EVWJPCY6AD97
NSF Program(s): Software & Hardware Foundation,
SOFTWARE ENG & FORMAL METHODS
Primary Program Source: 01001314DB NSF RESEARCH & RELATED ACTIVIT
01001415DB NSF RESEARCH & RELATED ACTIVIT

01001516DB NSF RESEARCH & RELATED ACTIVIT

01001617DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7924, 7944, 9251
Program Element Code(s): 779800, 794400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This project proposes a novel unified model to help software developers license software and (re)use components complying with legal requirements. The solution will investigate novel combinations of information retrieval, internet-scale source code search, repository mining, and static analysis approaches to detect origins of software components. The research will also rely on a feedback-driven hybrid blending of information retrieval and machine learning techniques for identifying components' licenses with high accuracy. In addition, the proposed model will unify these building blocks for license compliance analysis and verification to reason about the given software, components, dependencies, and licenses, as well as their trustworthiness, constraints, and existing or potential legal compliance issues.

The proposed research will lead to both theoretical foundations and practical solutions for the comprehensive analysis of complex legal compliance concerns to enable lawful software development and evolution. Among the broader impacts, the project will develop educational course content, involve underrepresented student groups, and produce software tools under open source licenses, collaborating with industry to transfer technology and empirically evaluate proposed research, and conducting K-12 outreach activities.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
Bavota, G., Linares-Vásquez, M., Bernal-Cárdenas, C., Di Penta, M., Oliveto, R., and Poshyvanyk, D. ""The Impact of API Change- and Fault-Proneness on the User Ratings of Android Apps"" IEEE Transactions on Software Engineering (TSE) , 2015
F. Palomba and G. Bavota and M. D. Penta and R. Oliveto and D. Poshyvanyk and A. De Lucia "Mining Version Histories for Detecting Code Smells" IEEE Transactions on Software Engineering , v.41 , 2015 , p.462-489 10.1109/TSE.2014.2372760
F. Palomba and G. Bavota and M. D. Penta and R. Oliveto and D. Poshyvanyk and A. De Lucia "Mining Version Histories for Detecting Code Smells" IEEE Transactions on Software Engineering , v.41 , 2015 , p.462-489 10.1109/TSE.2014.2372760
G. Bavota and M. Linares-Vasquez and C. E. Bernal-Cardenas and M. D. Penta and R. Oliveto and D. Poshyvanyk "The Impact of API Change- and Fault-Proneness on the User Ratings of Android Apps" IEEE Transactions on Software Engineering , v.41 , 2015 , p.384-407 10.1109/TSE.2014.2367027
G. Bavota and M. Linares-V?squez and C. E. Bernal-C?rdenas and M. D. Penta and R. Oliveto and D. Poshyvanyk "The Impact of API Change- and Fault-Proneness on the User Ratings of Android Apps" IEEE Transactions on Software Engineering , v.41 , 2015 , p.384-407 10.1109/TSE.2014.2367027
M. Tufano and F. Palomba and G. Bavota and R. Oliveto and M. Di Penta and A. De Lucia and D. Poshyvanyk "When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away)" IEEE Transactions on Software Engineering , v.PP , 2017 , p.1-1 10.1109/TSE.2017.2653105
M. Tufano and F. Palomba and G. Bavota and R. Oliveto and M. Di Penta and A. De Lucia and D. Poshyvanyk "When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away)" IEEE Transactions on Software Engineering , v.PP , 2017 , p.1-1 10.1109/TSE.2017.2653105
Palomba, Fabio and Linares-Vásquez, Mario and Bavota, Gabriele and Oliveto, Rocco and Di Penta, Massimiliano and Poshyvanyk, Denys and De Lucia, Andrea "Crowdsourcing User Reviews to Support the Evolution of Mobile Apps" Journal of Systems and Software , v.137 , 2017 10.1016/j.jss.2017.11.043
Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., Poshyvanyk, D., and De Lucia, A. ""Mining Version Histories for Detecting Code Smells"" IEEE Transactions on Software Engineering (TSE) , 2015
Vendome, Christopher and Linares-V{\'a}squez, Mario and Bavota, Gabriele and Di Penta, Massimiliano and German, Daniel and Poshyvanyk, Denys "License Usage and Changes: A Large-Scale Study on GitHub" Empirical Software Engineering (EMSE) , 2017
Vendome, Christopher and Linares-V{\'a}squez, Mario and Bavota, Gabriele and Di Penta, Massimiliano and German, Daniel and Poshyvanyk, Denys "License Usage and Changes: A Large-Scale Study on GitHub" Empirical Software Engineering (EMSE) , 2017
(Showing: 1 - 10 of 11)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

 

Reusing open source code is common nowadays − developers are frequently copying code fragments, files, or components from one system to another for reasons such as adding features from existing systems or fixing bugs using known and tested implementations.  Some recent research developments have focused on creating solutions to support these tasks; however, they are missing one important legal dimension: license compliance, if not satisfied, can create a number of severe legal consequences.  In fact, copying code without the approval of the copyright holder is generally prohibited.  For example, a US court ruled that even copying 25 out of 500KLOC could be considered a copyright infringement.  The problem is exacerbated by the fact that software developers frequently are not aware of the associated legal risks and their companies may not have specific policies and guidelines on this account.  

 

Open source software (comprising all the artifacts, not only the source code) is a system that is licensed under some open source license, which makes code or components available for creating derivative works based on it.  The open source licensing process is a vehicle for the licensor to grant certain rights to the licensee that would else be prohibited, such as the right to make, edit, and distribute copies.  In exchange for these rights, the licensee must abide by the requirements and constraints that such licenses entail.  There are numerous such Open Source Initiative approved licenses as of today, and they vary considerably in the constraints that they impose. What is even more challenging is that these licenses, their constraints and requirements, the components governed by them, and the software systems are all continually evolving.  However, at any given time, any software must satisfy all the requirements for each license of the components it reuses. Another growing challenge is to ensure license compatibility of the overall system with the licenses of each of its components, libraries, and even the tools it uses.  This requires a sophisticated analysis of the overall system's architecture, environment, and licenses to detect if overall licenses contradict any of its constituents’ licenses.  Moreover, this analysis needs to be done continually as software, dependencies, licenses, and their requirements evolve asynchronously.  

 

To address this issue, the project (1) defined and evaluated a unified model for establishing the provenance of components, detecting their licenses, and verifying legal licensing compliance concerns using novel combinations of Information Retrieval, Mining Software Repositories, internet-scale source code search, and static analysis approaches to detect origins of software components, (2) defined and instantiated new methodologies to support specific development and maintenance tasks, and (3) performed empirical case studies to evaluate techniques and methodologies; and (4) created and maintained a repository of software artifacts and analysis data to support rigorously controlled experimentation and benchmarking in the research community. Some of the broader impacts from this project include (1) improving the state-of-the-practice in software development that faces difficulties in ensuing legal licensing compliance challenges (2) demonstrating improved licensing practices, (3) developing educational course content and piloting it in our courses as part of this research proposal, and (4) actively involving underrepresented categories of students in this research program.

 

The resulting work has been published in several high-quality software engineering conferences and journals (some gaining best result recognition).  A number of undergraduate and graduate students, including a minority doctoral student, were trained and became contributing members on this project.  Several of these students co-authored and presented papers at international conferences.  Multiple graduate-level theses were derived from this project. The students graduating from this program have secured full-time employment in academia and software industry. The gained scientific knowledge was integrated in multiple undergraduate and graduate classes at the host institutions, which broadens STEM education.  ACM/IEEE Software Engineering Curriculum Guidelines identified software evolution among the ten key areas of SE education. A number of open-source software tools were developed and are made available publicly.  The data repositories resulting from this project are made accessible to the scientific community and general public through the PI’s web site. The project enhanced and strengthened a long-term professional collaboration not only between the PI and his students but also multiple collaborators involved.  The computing infrastructure established during the course of the project permits the sustainability of its resources.


Last Modified: 01/15/2019
Modified by: Denys Poshyvanyk

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page