Skip to feedback

Award Abstract # 1544455
EAGER: Toward Transparency in Public Policy via Privacy-Enhanced Social Flow Analysis with Applications to Ecological Networks and Crime

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: THE PENNSYLVANIA STATE UNIVERSITY
Initial Amendment Date: August 17, 2015
Latest Amendment Date: May 10, 2016
Award Number: 1544455
Award Instrument: Standard Grant
Program Manager: Sara Kiesler
skiesler@nsf.gov
 (703)292-8643
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2015
End Date: August 31, 2018 (Estimated)
Total Intended Award Amount: $260,991.00
Total Awarded Amount to Date: $276,991.00
Funds Obligated to Date: FY 2015 = $260,991.00
FY 2016 = $16,000.00
History of Investigator:
  • Zhenhui Li (Principal Investigator)
  • Daniel Kifer (Co-Principal Investigator)
  • Corina Graif (Co-Principal Investigator)
Recipient Sponsored Research Office: Pennsylvania State Univ University Park
201 OLD MAIN
UNIVERSITY PARK
PA  US  16802-1503
(814)865-1372
Sponsor Congressional District: 15
Primary Place of Performance: Pennsylvania State Univ University Park
PA  US  16802-7000
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): NPM2J7MSCF61
Parent UEI:
NSF Program(s): Special Projects - CNS,
Secure &Trustworthy Cyberspace
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 114Z, 7434, 7916, 8225, 9102, 9178, 9251
Program Element Code(s): 171400, 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Recent improvements in computing capabilities, data collection, and data science have enabled tremendous advances in scientific data analysis. However, the relevant data are often highly sensitive (e.g., Census records, tax records, medical records). This project addresses an emerging and critical scientific problem: Privacy concerns limit access to raw data that might reveal information about individuals. Techniques to "sanitize" such data (e.g., anonymization) could have negative impact on the quality of the scientific results that use the data. How can we provide data that protect the privacy of individuals but also accurately support scientific analyses?

The project addresses challenges regarding analysis of privacy-preserving sanitized data: (1) How can sanitized data be analyzed so that conclusions will stand up to peer review? (2) What workflows and visualizations must be supported by privacy technology? (3) How can scientists assess bias introduced by sanitization without access to the raw data? The project focuses specifically on "social flow analysis," in which data analysis is performed on sensitive social flow data (e.g., commuting patterns, migration trajectories) of individuals or families. The researchers are developing an ecological model of networks of neighborhoods that are linked by social flows and studying how social flows are formed and maintained. The project is cataloging the types of data access and visualization needed to develop such theories, studying alternative analyses that are both scalable and statistically robust, developing preliminary privacy-preserving data protection methods, and evaluating whether the privacy-preserving methods enable the same conclusions as access to the raw data.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 14)
Fei Wu and Zhenhui Li "Where Did You Go: Personalized Annotation of Mobility Records" 2016 International Conference on Information and Knowledge Management (CIKM'16) , 2016
Fei Wu, Hongjian Wang, and Zhenhui Li "Interpreting Traffic Dynamics using Ubiquitous Urban Data" 24th ACM International Conference on Advances in Geographical Information Systems (SIGSPATIAL'16) , 2016
Graif, Corina, Alina Lungenu, and Alyssa Yetter "Neighborhood isolation in a Rust Belt city: Violent crime effects on increasing structural isolation and homophily in inter-neighborhood commuting networks" Social Networks , 2017
Hongjian Wang and Zhenhui Li "Uncovering Urban Dynamics via Mobility Flow Representation Learning" Conference on Information and Knowledge Management (CIKM'17) , 2017
Hongjian Wang, Daniel Kifer, Corina Graif, Zhenhui Li "Crime Rate Inference using Big Data" Proc. of 2016 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'16), San Francisco, CA, Aug. 2016 , 2016
Hongjian Wang, Huaxiu Yao, Daniel Kifer, Corina Graif, and Zhenhui Li "Non-Stationary Model for Crime Rate Inference Using Modern Urban Data" IEEE Transcantions on Big Data , 2017
Hongjian Wang, Yu-Hsuan Kuo, Daniel Kifer, and Zhenhui Li "A Simple Baseline for Travel Time Estimation using Large-Scale Trip Data" 24th ACM International Conference on Advances in Geographical Information Systems (SIGSPATIAL'16) , 2016
Howard-Tripp, Alyssa and Corina Graif "Parental Immigration, Legal Status, and Children?s Risky and Delinquent Behaviors" American Society of Criminology Meeting in New Orleans, LA. , 2016
Kuo, Yu-Hsuan and Chiu, Cho-Chun and Kifer, Daniel and Hay, Michael and Machanavajjhala, Ashwin "Differentially private hierarchical count-of-counts histograms" Proceedings of the VLDB Endowment , v.11 , 2018 10.14778/3236187.3236202 Citation Details
Kuo, Yu-Hsuan and Li, Zhenhui and Kifer, Daniel "Detecting Outliers in Data with Correlated Measures" Proceedings of the 27th ACM International Conference on Information and Knowledge Management , 2018 10.1145/3269206.3271798 Citation Details
Kuo, Yu-Hsuan, Zhenhui Li, and Daniel Kifer "Detecting Outliers in Data with Correlated Measures" in Proceedings of the 2018 International Conference on Information and Knowledge Management (CIKM'18) , 2018
(Showing: 1 - 10 of 14)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The objective of this project is to study privacy preserving techniques that not only preserve data privacy but also guarantee data analytic results. In this project, we focus on crime data analysis by using an important feature - social flow. Social flow (e.g., Longitudinal Employer-Household Dynamics from Census, taxi flow) are highly sensitive, but could be useful to understand the crime correlations and spread among neighborhoods. 

Towards achieving this objective, our specific goals include: (1) Design and implement scalable social flow model; (2) Assess predictive capability of various social flow measures on crime and poverty and relate the results to theoretical social science models; (3) Re-visit privacy preserving techniques and study how they can preserve the analytical results.

The key outcomes of this project are as follows. First, we propose to use a negative binomial model that models the correlation between crime count and social flow. We show that social flow data are useful for crime count inference. Second, we systematically study the importance of all the features including demographic information, points of interest, geographical impact, and social flow impact w.r.t. different categories of crime count. The study is carried on large-scale real data (e.g., social flow data are described by millions of taxi trips). Third, we study the spatially-varying correlation model and find that the correlations between features and crime count vary across the space. Lastly, we show an important finding that neighborhood crime depends not just on internal or surrounding disadvantage but also on the disadvantage of areas connected to it through commuting. The findings contribute to ecological theories of crime, social isolation, and ecological networks by showing that communities can influence each other from a distance and suggesting that connectivity to less disadvantaged work hubs may decrease local crime.


Last Modified: 12/17/2018
Modified by: Zhenhui Li

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page