Award Abstract # 2029651
RAPID: Collaborative: A Privacy Risk Assessment Framework for Person-Level Data Sharing During Pandemics

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: VANDERBILT UNIVERSITY MEDICAL CENTER
Initial Amendment Date: May 14, 2020
Latest Amendment Date: May 14, 2020
Award Number: 2029651
Award Instrument: Standard Grant
Program Manager: Jeremy Epstein
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 15, 2020
End Date: April 30, 2021 (Estimated)
Total Intended Award Amount: $99,999.00
Total Awarded Amount to Date: $99,999.00
Funds Obligated to Date: FY 2020 = $99,999.00
History of Investigator:
  • Bradley Malin (Principal Investigator)
    bradley.malin@vanderbilt.edu
Recipient Sponsored Research Office: Vanderbilt University Medical Center
1161 21ST AVE S STE D3300 MCN
NASHVILLE
TN  US  37232-0001
(615)322-2450
Sponsor Congressional District: 07
Primary Place of Performance: Vanderbilt University Medical Center
3319 West End Avenue, Suite 970
Nashville
TN  US  37203-6856
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): GYLUH9UXHDX5
Parent UEI:
NSF Program(s): COVID-19 Research
Primary Program Source: 010N2021DB R&RA CARES Act DEFC N
Program Reference Code(s): 025Z, 065Z, 096Z, 7914
Program Element Code(s): 158Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070
Note: This Award includes Coronavirus Aid, Relief, and Economic Security (CARES) Act funding.

ABSTRACT

The COVID-19 pandemic has demonstrated that sharing data is critical to building better statistical epidemiological models, enabling policy decisions (in the public and private sector), and assuring the health of the public. Moreover, the situation has evolved quickly, indicating that data sharing needs to take place repeatedly and in a timely manner. To date, much of the data sharing that has taken place has focused on aggregate statistics (e.g., counts of events), yet some of the most important data is at the person-level, which is critical to providing intuition into how comorbidities influence health outcomes and model the trajectory of the disease in a temporal-spatial perspective. This data is captured by a large number of service providers who wish to support these endeavors, but are concerned that doing so will infringe upon the privacy rights of the corresponding individuals, particularly their anonymity. To enable timely, useful and privacy-preserving releases of patient specific COVID-19 data, this project aims to develop and disseminate novel privacy-risk assessment techniques, implemented in working software, to assist data managers, as well as public health officials, to reason about the tradeoffs between privacy risks (with a focus on re-identification, according to current law) and public data utility. The project will provide the best practices and tools needed for sharing patient-specific data about individuals diagnosed with, or suspected of, COVID-19.

This project will develop novel, and dynamic privacy risk assessment models for disclosing data in support of epidemiological investigations (and particularly pandemics) by considering evolving privacy risks and data utility. In doing so, the proposed models will be tailored to enable the disclosure of geographic-, demographic-, and clinically-relevant phenomena (e.g., health indications based on pharmaceutical prescriptions or purchases) by modeling a much richer data attribute space, specifically one that is important for modeling epidemiologic risk factors associated with biological agents, such as COVID-19. To model evolving privacy risks, privacy risk estimation models that consider multiple types of potential re-identification attacks and data redactions used to release multiple versions of the same data will be developed. Furthermore, the proposed models will be oriented to support utility functions that are specific to bio-surveillance efforts, including those which have emerged for COVID-19 modeling and response. Finally, to ensure that the proposed approach is accessible and reusable widely, an open source software tool, that enables data custodians, and particularly public health authorities, to make informed decisions appropriately balancing public health goals with personal privacy when sharing data, will be released.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The COVID-19 pandemic has demonstrated that sharing data is critical to building better statistical epidemiological models, enabling policy decisions (in the public and private sector), and assuring the health of the public. Moreover, the situation has evolved quickly, indicating that data sharing needs to take place repeatedly and in a timely manner. To date, much of the data sharing that has taken place has focused on aggregate statistics (e.g., counts of events), yet some of the most important data is at the person-level, which is critical to providing intuition into how comorbidities influence health outcomes and model the trajectory of the disease in a temporal-spatial perspective. This data is captured by a large number of service providers who wish to support these endeavors, but are concerned that doing so will infringe upon the privacy rights of the corresponding individuals, particularly their anonymity. To enable timely, useful and privacy-preserving releases of patient specific COVID-19 data, this project developed and disseminated novel privacy-risk assessment techniques, implemented in working software, to assist data managers, as well as public health officials, to reason about the tradeoffs between privacy risks (with a focus on re-identification, according to current law) and public data utility.  

This project further developed novel, and dynamic privacy risk assessment models for disclosing data in support of epidemiological investigations (and particularly pandemics) by considering evolving privacy risks and data utility. In doing so, the models are tailored to enable the disclosure of geographic-, demographic-, and clinically-relevant phenomena (e.g., health indications based on pharmaceutical prescriptions or purchases).  This is accomplished by modeling a much richer data attribute space, specifically one that is important for modeling epidemiologic risk factors associated with biological agents, such as COVID-19. Moreover, the data sharing models are oriented to support utility functions that are specific to bio-surveillance efforts, including those which have emerged for COVID-19 modeling and response.

This project has led to development of open source software and one technical report, which is under consideration for publication in a peer-reviewed journal.

 


Last Modified: 09/06/2021
Modified by: Bradley A Malin

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page