
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | August 26, 2016 |
Latest Amendment Date: | January 4, 2021 |
Award Number: | 1636766 |
Award Instrument: | Standard Grant |
Program Manager: |
Martin Halbert
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | December 31, 2021 (Estimated) |
Total Intended Award Amount: | $444,000.00 |
Total Awarded Amount to Date: | $816,440.00 |
Funds Obligated to Date: |
FY 2019 = $372,440.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
77 MASSACHUSETTS AVE CAMBRIDGE MA US 02139-4301 (617)253-1000 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
77 Massachusetts Ave. Cambridge MA US 02139-4307 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | BD Spokes -Big Data Regional I |
Primary Program Source: |
01001920RB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Sharing of data sets can provide tremendous mutual benefits for industry, researchers and nonprofit organizations. For example, companies can profit from the fact that university researchers explore their data sets and make discoveries, which help the company to improve their business. At the same time, researchers are always on the search for real world data sets to show that their newly developed techniques work in practice. Unfortunately, many attempts to share relevant data sets between different stakeholders in industry and academia fail or require a large investment to make data sharing possible. A major obstacle is that data often comes with prohibitive restrictions on how it can be used (e.g., requiring the enforcement of legal terms or other policies, handling data privacy issues, etc.). In order to enforce these requirements today, lawyers are usually involved in negotiation the terms of each contract. It is not atypical that this process of creating an individual contract for data sharing ends up in protracted negotiations, as both sides struggle with the implications and possibilities of modern security, privacy, and data sharing techniques. Worse, fears of missing a loophole in how the data might be (mis)used often prevents many data sharing efforts from even getting started. To address these challenges, our new data sharing spoke will enable data providers to easily share data while enforcing constraints on the use of the data. This effort has two key components:(1) Creating a licensing model for data that facilitates sharing data that is not necessarily open or free between different organizations and (2) Developing a prototype data sharing software platform, ShareDB, which enforces the terms and restrictions of the developed licenses. We believe these efforts will have a transformative impact on how data sharing takes place. By moving data out of the silos of individuals and single organizations and into the hands of broader society, we can tackle many societally significant problems.
This new data sharing spoke will enable data providers to easily share data while enforcing constraints on the use of the data. Many services and platforms that provide access to data sets exist already today. However, these platforms generally promote completely open access and do not address the aforementioned issues that arise when dealing with proprietary data. Thus, the effort has three key components: (1) Creating a licensing model for data that facilitates sharing data that is not necessarily open or free between different organizations, (2) developing a prototype data sharing software platform, ShareDB, which enforces the terms and restrictions of the developed licenses, and (3) developing and integrating relevant metadata that will accompany the datasets shared under the different licenses, making them easily searchable and interpretable. To ensure that the developed tools and licenses are useful, the project will form the Northeast Data Sharing Group, comprising many different stakeholders to make the licensing model widely accepted and usable in many application domains (e.g., health and finance). The intellectual merit of this proposal is to design a licensing model and a data sharing platform that is widely accepted and usable as a template in many different domains. While there exist other efforts to enable data sharing (e.g., Creative Commons), they focus on the case where the data owner is willing to openly share the data on the Internet. This licensing model and the ecosystem is different since it allows data owners to enforce certain requirements stated in a data sharing agreement (e.g., on who is allowed to access the data) and also provides tools to make data sharing of sensitive information safe. The licenses and software we propose to investigate will make it easier for organizations to open up their data to the appropriate organizations, while maintaining the ability to ensure it is protected, that access is revocable, and that access controls and audit logs are maintained.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The Northeast Big Data SPOKE project “A Licensing Model and Ecosystem for Data Sharing” was a collaborative project involving researchers in the Computer Science at the Massachusetts Institute of Technology (MIT) and the Metadata Research Center in the College of Computing and Informatics at Drexel University. The overall aim was to develop a data sharing system and an approach that addresses legal matters, policies, privacy concerns, as well as a number of technical challenges that too frequently hold up the process of collaborating through data. Specific results included:
1) Creating a licensing model for data that facilitates sharing data that is not necessarily open or free between different organizations.
We collected a large number of data sharing agreements and conducted a survey of the types of licenses that are used in them. This allowed us to create a metadata taxonomy to classify and simplify sharing agreements.
2) Developing a prototype data sharing software platform, ShareDB that enforces agreement terms and restrictions for the licenses developed, and that includes features for building processing pipelines over those shared data sets and finding errors and anomalies in that data
Our prototype sharing system ShareDB included several different anonymization features, including differential privacy.
3) We developed and integrated relevant metadata that accompany the datasets shared under the different licenses, making them easily searchable and interpretable.
Last Modified: 05/02/2022
Modified by: Samuel Madden
Please report errors in award information by writing to: awardsearch@nsf.gov.