
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | July 23, 2011 |
Latest Amendment Date: | July 23, 2011 |
Award Number: | 1115234 |
Award Instrument: | Standard Grant |
Program Manager: |
Nan Zhang
CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2011 |
End Date: | July 31, 2016 (Estimated) |
Total Intended Award Amount: | $495,935.00 |
Total Awarded Amount to Date: | $495,935.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
809 S MARSHFIELD AVE M/C 551 CHICAGO IL US 60612-4305 (312)996-2862 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
809 S MARSHFIELD AVE M/C 551 CHICAGO IL US 60612-4305 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | TRUSTWORTHY COMPUTING |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Massive graphs arise in many social media applications, such as social networks, E-commerce recommendation systems, e-mail communication patterns, and other collaborative applications. Such data is often sensitive from a privacy point of view. Recently there are many privacy preserving schemes being proposed to protect the release of network data. The question is how effective these schemes are on preventing the re-identification of nodes, i.e., preserving the identity anonymization of the network nodes.
This project will raise the issue of the inadequacy of the current network anonymization schemes for massive and sparse graphs. It is important to understand the theoretical properties which make them susceptible to re-identification attacks. By a systematic study of the re-identification risks of the existing approaches, and development of new principles for anonymization of network data, we will deepen our understanding of the problems and be better able to protect the data privacy. By designing a new type of attack algorithms and raising the issue on the privacy exposure of the current network anonymization schemes, the work can lead to fundamentally different thinking on how to perform privacy preserving data publishing on network data. It provides new insights on how to devise anonymization schemes to protect the privacy of social network data.
One of the biggest obstacles on sharing information is the privacy concern. This project has the potential to make fundamental, disruptive advances in protecting the privacy of network data. It provides new insights on the inadequacy of the current anonymization schemes. Many researchers need access to sensitive data, e.g., social network data, e-mail and communication patterns, etc. By advancing the knowledge on privacy preserving data publishing, the barrier of sharing data will come down to facilitate scientific research activities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Many applications such as social networks, email communication patterns, and other collaborative applications are built on top of graph/network structures. These network data offer tremendous opportunities on mining useful information for advanced services, such as recommendation, personalized medicine, location based service, etc. However, there is also a threat to privacy because data in raw form often contain sensitive information about individuals. Privacy-preserving data publishing (PPDP) studies how to transform raw data into a version that is immunized against privacy attacks, but that still supports effective data analysis. Although a straightforward solution is to remove identifying information from the nodes and perturb the graph structure, so that re-identification becomes more difficult, it is generally insufficient.
Group based anonymization is the most widely studied approach for privacy-preserving data publishing. The work of this proposal is the first to identify its exposures on network anonymizatioin. As typical graphs encountered in real applications are sparse, this work shows that sparse graphs have certain theoretical properties, which make them susceptible to re-identification attacks, and provides effective solutions to address these potential attacks. Various cases of network anonymization attacks have been identified and addressed including sequential releases of dynamic networks, friendship type attacks based on number of mutual friends between two persons, or the degrees (i.e., number of friends) of two vertices (persons) connected by a friendship link. Furthermore, in the era of big data, there are many data sources that can be integrated to help de-anonymized the data. For example, the potential exposure to alignment of multiple anonymized social network data is studied. ? -differential privacy is an alternative approach designed for an interactive querying model. A novel data publishing approach is proposed for the non-interactive setting based on ? -differential privacy.
The work creates an awareness of the weakness of the current privacy preserving data publishing schemes on network data and proposes approaches to alleviate these issues. This will facilitate the sharing of data to advance data-driven research.
The PI is the recipient of ACM SIGKDD 2016 Innovation Award for his influential research and scientific contributions on mining, fusion and anonymization of big data, and the IEEE Computer Society’s 2013 Technical Achievement Award for “pioneering and fundamentally innovative contributions to the scalable indexing, querying, searching, mining and anonymization of big data”.
Last Modified: 08/06/2016
Modified by: Philip S Yu
Please report errors in award information by writing to: awardsearch@nsf.gov.