Award Abstract # 1636788
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: DREXEL UNIVERSITY
Initial Amendment Date: August 26, 2016
Latest Amendment Date: December 9, 2020
Award Number: 1636788
Award Instrument: Standard Grant
Program Manager: Martin Halbert
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2016
End Date: December 31, 2021 (Estimated)
Total Intended Award Amount: $209,793.00
Total Awarded Amount to Date: $209,793.00
Funds Obligated to Date: FY 2016 = $209,793.00
History of Investigator:
  • Jane Greenberg (Principal Investigator)
    janeg@drexel.edu
Recipient Sponsored Research Office: Drexel University
3141 CHESTNUT ST
PHILADELPHIA
PA  US  19104-2875
(215)895-6342
Sponsor Congressional District: 03
Primary Place of Performance: Drexel University
1505 Race Street
Philadelphia
PA  US  19102-1119
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): XF3XM9642N96
Parent UEI:
NSF Program(s): BD Spokes -Big Data Regional I
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 028Z, 7433, 8083
Program Element Code(s): 024Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Sharing of data sets can provide tremendous mutual benefits for industry, researchers and nonprofit organizations. For example, companies can profit from the fact that university researchers explore their data sets and make discoveries, which help the company to improve their business. At the same time, researchers are always on the search for real world data sets to show that their newly developed techniques work in practice. Unfortunately, many attempts to share relevant data sets between different stakeholders in industry and academia fail or require a large investment to make data sharing possible. A major obstacle is that data often comes with prohibitive restrictions on how it can be used (requiring e.g., the enforcement of legal terms or other policies, handling data privacy issues, etc.). In order to enforce these requirements today, lawyers are usually involved in negotiation the terms of each contract. It is not atypical that this process of creating an individual contract for data sharing ends up in protracted negotiations, which are both disconnected from what the actual stakeholders aim to do and fraught as both sides struggle with the implications and possibilities of modern security, privacy, and data sharing techniques. Worse, fear of missing a loophole in how the data might be (mis)used often prevents many data sharing efforts from even getting off the ground. To address these challenges, our new data sharing spoke will enable data providers to easily share data while enforcing constraints on the use of the data. This effort has two key components:(1) Creating a licensing model for data that facilitates sharing data that is not necessarily open or free between different organizations and (2) Developing a prototype data sharing software platform, ShareDB, which enforces the terms and restrictions of the developed licenses. We believe these efforts will have a transformative impact on how data sharing takes place. By moving data out of the silos of individuals and single organizations and into the hands of broader society, we can tackle many societally significant problems.

This new data sharing spoke will enable data providers to easily share data while enforcing constraints on the use of the data. Many services and platforms that provide access to data sets exist already today. However, these platforms generally promote completely open access and do not address the aforementioned issues that arise when dealing with proprietary data. Thus, the effort has three key components: (1) Creating a licensing model for data that facilitates sharing data that is not necessarily open or free between different organizations and (2) developing a prototype data sharing software platform, ShareDB, which enforces the terms and restrictions of the developed licenses, and (3) developing and integrating relevant metadata that will accompany the datasets shared under the different licenses, making them easily searchable and interpretable. To ensure that the developed tools and licenses are useful, the project will form the Northeast Data Sharing Group, comprising of many different stakeholders to make the licensing model widely accepted and usable in many application domains (e.g., health and finance). The intellectual merit of this proposal is to design a licensing model and a data sharing platform that is widely accepted and usable as a template in many different domains. While there exist other efforts to enable data sharing (e.g., Creative Commons), they focus on the case where the data owner is willing to openly share the data on the Internet. This licensing model and the ecosystem is different since it allows data owners to enforce certain requirements stated in a data sharing agreement (e.g., on who is allowed to access the data) and also provides tools to make data sharing of sensitive information safe. The licenses and software we propose to investigate will make it easier for organizations to open up their data to the appropriate organizations, while maintaining the ability to ensure it is protected, that access is revocable, and that access controls and audit logs are maintained.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Grabus, Sam and Greenberg, Jane "The Landscape of Rights and Licensing Initiatives for Data Sharing" Data Science Journal , v.18 , 2019 https://doi.org/10.5334/dsj-2019-029 Citation Details
Grabus, Sam and Greenberg, Jane "Toward a Metadata Framework for Sharing Sensitive and Closed Data: An Analysis of Data Sharing Agreement Attributes" MTSR 2017. Communications in Computer and Information Science , v.755 , 2017 Citation Details
Greene, Mica and Grabus, Sam "DARSI: An Ontology for Facilitating the Development of Data Sharing and Use Agreements" Proceedings from North American Symposium on Knowledge Organization , v.8 , 2021 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

  1. DARSI: Data Sharing Agreements for Restricted and Sensitive Information: The ontology developed for Data Sharing Agreements for Restricted and Sensitive Information was coined DARSI, and OWL (Web Ontology Language) rendering was produced. The Protege instance was made accessible to team members who wish to contribute to the ontology. The ontology includes four top level classes and 20 sub-classes at the top level, and over 200 individuals (terms representing concepts). DARSI has been shared with the QDR (Qualitative Data Repository) group as they aim to develop a pathway for enabling the deposition and sharing of sensitive and restricted research data.
  2. Data sharing directory: The DSD was developed using YAML, and four editors are engaged. The new framework on GitHub allows for other users to join. We developed a Users’ Guide for researchers, library science students, and front-line information professionals who wish to contribute to the DSD, although they are new to GitHub. The DSD as an informational resource has been important to the development of a data sharing agreement with OCLC and the LEADING program, and has recently served as a resource the IEEE Big Data Governance and Metadata Management working group.
  3. Atlantic Prototype: ATLANTIC sits between the user and the database. Given a user query, ATLANTIC uses query-rewriting to modify the query to collect the statistics required as input for the differential privacy algorithms. After query results come back, it runs the differential privacy algorithms on the results. In this way, the underlying database remains unchanged and any differential privacy mechanisms can be seamlessly plugged into ATLANTIC.  We then use a set of sample queries and a database sample to learn a machine learning model for differential privacy that is customized to the given data and query workload. The result is more precise than the general purpose statistical methods used in previous differential privacy work. Specifically, our model, which takes the sampling rate into consideration, provides ATLANTIC a way to automatically choose an optimal sampling rate for each category of queries, offering strong privacy and accuracy guarantees while minimizing the query execution time.  A demo of ATLANTIC was published in VLDB 2021.
  4. A research collection of ~ 100 data sharing agreements and templates addressing aspects of sharing sensitive and restricted research data.
  5. Publications, presentations, and other scholarly and scientific outputs, including isualizations, slides, and ontology, code, and the DSD user guide.

 


Last Modified: 03/31/2022
Modified by: Jane Greenberg

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page