
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 13, 2013 |
Latest Amendment Date: | September 8, 2017 |
Award Number: | 1343976 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | January 1, 2014 |
End Date: | December 31, 2018 (Estimated) |
Total Intended Award Amount: | $269,505.00 |
Total Awarded Amount to Date: | $269,505.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1918 F ST NW WASHINGTON DC US 20052-0042 (202)994-0728 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
801 22nd St NW Washington DC US 20052-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Smart and Connected Health |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
This project builds a novel privacy-preserving framework with both new algorithms and software tools to: 1) evaluate the effectiveness of current identifier-suppression techniques for Electronic Healthcare Record (EHR) data; 2) de-identify and anonymize EHR data to protect personal information without significantly reducing the utility of data for secondary data analysis. The proposed techniques eliminate the violation of privacy through re-identification, and facilitate the secondary usage, sharing, publishing and exchange of healthcare data without the risk of breaching protected health information (PHI). This new privacy-preserving framework injects the ICD-9-CM-aware constraint-based privacy-preserving techniques into EHRs to eliminate the threat of identifying an individual in the secondary use of research data. The proposed technique and development can be readily adapted to other types of healthcare databases in order to ensure privacy and prevent re-identification of published data. The project produces groundbreaking algorithms and tools for identifying privacy leakages and protecting personal privacy information in EHRs to improve healthcare data publishing. New privacy-preserving techniques developed in this project lead towards a new type of healthcare science for EHRs. The project also delivers fundamental advancements to engineering by showing how to integrate biomedical domain knowledge with a computationally advanced quantitative framework for preserving the privacy of published EHRs. HIPAA has established protocols and industry standards to protect the confidentiality of PHI. However, our results demonstrate that, even with regard to health data that meets HIPAA requirements, the risk of re-identification is not completely eliminated. By identifying the security vulnerabilities inherent in the HIPAA standards, our research develops a more rigorous security standard that greatly improves privacy protections by applying state-of-the-art algorithms.
The developed data privacy-preserving framework has significant implications for the future of US healthcare data publishing and related applications. Specifically, the transition from paper records to EHRs has accelerated significantly since the passage of the HITECH Act of 2009. The Act provides monetary incentives for the "meaningful use" of EHRs. As a result, the quality and quantity of healthcare databases has risen sharply, which has renewed the public's fear of a breach of privacy of their medical information. This research work is innovative and crucial not only for facilitating EHR data publishing, but also for enhancing the development and promotion of EHRs. At the educational front, this project facilitates the development of novel educational tools to construct entirely new courses and laboratory classes for healthcare, data privacy, data mining, and a wide range of applications. As a result, it enhances the current instructional methods for teaching data privacy and data mining, and has compelling biomedical and healthcare applications that can facilitate learning of computational algorithms. This project involves both undergraduate and graduate students in the three participating institutions. The PIs make a strong effort to engage minority graduate and undergraduate students in research activities in order to increase their exposure to cutting-edge research.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Our research findings are highly relevant to the current debate over health information privacy. Our research calls into question the efficacy of the state-of-the-practice anonymization framework. For example, we demonstrated how the publication of public health data, even after applying state-of-the-practice data anonymization techniques, could lead to breaches of patient privacy. Based on the findings, in an article published at the Anesthesia and Analgesia journal, we proposed an editorial policy for anesthesia journals to significantly reduce the likelihood of a privacy breach while supporting the goal of transparency of the research process.
Another interesting finding from our research is related to the linking of healthcare data with publicly available information, such as data from social media. Contrary to the conventional wisdom that privacy leakage in this case is mostly caused by a user's own posts, we found that a significant amount of privacy disclosure on social media, especially on attributes that are often linkable to healthcare data, is caused by activities of a user's social ties. Our research not only demonstrated the existence of such "peer-disclosure", but also identified the intriguing differences between the identity elements revealed through self- and peer-disclosure.
Our research also examines the inherent tradeoff between privacy protection and data utility. For example, we studied how the application of a state-of-the-practice data anonymization technique and a state-of-the-art differential privacy technique affects the ability for researchers to detect evidence of health disparity from the privacy-preserved data. Our results demonstrated the complex challenges facing privacy protection in health disparity research, as the essential data elements for enabling health disparity research might lead to privacy disparity and cause further harm to underserved populations if not carefully treated.
Last Modified: 03/22/2019
Modified by: Nan Zhang
Please report errors in award information by writing to: awardsearch@nsf.gov.