
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | August 17, 2015 |
Latest Amendment Date: | May 10, 2016 |
Award Number: | 1544455 |
Award Instrument: | Standard Grant |
Program Manager: |
Sara Kiesler
skiesler@nsf.gov (703)292-8643 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2015 |
End Date: | August 31, 2018 (Estimated) |
Total Intended Award Amount: | $260,991.00 |
Total Awarded Amount to Date: | $276,991.00 |
Funds Obligated to Date: |
FY 2016 = $16,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
201 OLD MAIN UNIVERSITY PARK PA US 16802-1503 (814)865-1372 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
PA US 16802-7000 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Special Projects - CNS, Secure &Trustworthy Cyberspace |
Primary Program Source: |
01001617DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Recent improvements in computing capabilities, data collection, and data science have enabled tremendous advances in scientific data analysis. However, the relevant data are often highly sensitive (e.g., Census records, tax records, medical records). This project addresses an emerging and critical scientific problem: Privacy concerns limit access to raw data that might reveal information about individuals. Techniques to "sanitize" such data (e.g., anonymization) could have negative impact on the quality of the scientific results that use the data. How can we provide data that protect the privacy of individuals but also accurately support scientific analyses?
The project addresses challenges regarding analysis of privacy-preserving sanitized data: (1) How can sanitized data be analyzed so that conclusions will stand up to peer review? (2) What workflows and visualizations must be supported by privacy technology? (3) How can scientists assess bias introduced by sanitization without access to the raw data? The project focuses specifically on "social flow analysis," in which data analysis is performed on sensitive social flow data (e.g., commuting patterns, migration trajectories) of individuals or families. The researchers are developing an ecological model of networks of neighborhoods that are linked by social flows and studying how social flows are formed and maintained. The project is cataloging the types of data access and visualization needed to develop such theories, studying alternative analyses that are both scalable and statistically robust, developing preliminary privacy-preserving data protection methods, and evaluating whether the privacy-preserving methods enable the same conclusions as access to the raw data.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The objective of this project is to study privacy preserving techniques that not only preserve data privacy but also guarantee data analytic results. In this project, we focus on crime data analysis by using an important feature - social flow. Social flow (e.g., Longitudinal Employer-Household Dynamics from Census, taxi flow) are highly sensitive, but could be useful to understand the crime correlations and spread among neighborhoods.
Towards achieving this objective, our specific goals include: (1) Design and implement scalable social flow model; (2) Assess predictive capability of various social flow measures on crime and poverty and relate the results to theoretical social science models; (3) Re-visit privacy preserving techniques and study how they can preserve the analytical results.
The key outcomes of this project are as follows. First, we propose to use a negative binomial model that models the correlation between crime count and social flow. We show that social flow data are useful for crime count inference. Second, we systematically study the importance of all the features including demographic information, points of interest, geographical impact, and social flow impact w.r.t. different categories of crime count. The study is carried on large-scale real data (e.g., social flow data are described by millions of taxi trips). Third, we study the spatially-varying correlation model and find that the correlations between features and crime count vary across the space. Lastly, we show an important finding that neighborhood crime depends not just on internal or surrounding disadvantage but also on the disadvantage of areas connected to it through commuting. The findings contribute to ecological theories of crime, social isolation, and ecological networks by showing that communities can influence each other from a distance and suggesting that connectivity to less disadvantaged work hubs may decrease local crime.
Last Modified: 12/17/2018
Modified by: Zhenhui Li
Please report errors in award information by writing to: awardsearch@nsf.gov.