
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | March 31, 2010 |
Latest Amendment Date: | March 31, 2010 |
Award Number: | 0953330 |
Award Instrument: | Standard Grant |
Program Manager: |
Maria Zemankova
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | July 1, 2010 |
End Date: | June 30, 2016 (Estimated) |
Total Intended Award Amount: | $529,962.00 |
Total Awarded Amount to Date: | $529,962.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
5000 FORBES AVE PITTSBURGH PA US 15213-3815 (412)268-8746 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
5000 FORBES AVE PITTSBURGH PA US 15213-3815 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Info Integration & Informatics, SciSIP-Sci of Sci Innov Policy |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The goal of this research is to create and explore novel methods for detection of emerging events in massive, complex real-world datasets. The approach consists of new algorithms to efficiently and exactly find the most anomalous subsets of a large, high-dimensional dataset, as well as methodological advances to incorporate incremental model learning from user feedback into event detection, incorporate society-scale data from emerging, transformative technologies such as cellular phones and user-generated web content, and augment event detection by creating methods and tools for event characterization, explanation, visualization, investigation and response.
The experimental research is integrated with a multi-pronged educational initiative to incorporate machine learning into the public policy curriculum through development of courses and seminars, workshops in machine learning and policy research and education, and establishment of a new Joint Ph.D. Program in Machine Learning and Policy. The results of this project will be incorporated into deployed event surveillance systems and applied to the public health, law enforcement, and health care domains, enabling more timely and accurate detection of emerging outbreaks of disease, prediction of emerging hot-spots of violent crime, and identification of anomalous patterns of patient care. Project results, including publications, software, and datasets, will be disseminated via project web site (http://www.cs.cmu.edu/~neill/CAREER).
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project developed a variety of novel methods for accurate and computationally efficient detection of emerging events and other patterns in massive, complex datasets. We translated these methodological advances into real-world systems that can benefit public health, safety, and security, in areas including:
* Disease surveillance: we used electronically available public health data such as hospital visits and medication sales to automatically identify and characterize emerging disease outbreaks in their very early stages.
* Law enforcement and urban analytics: we were able to accurately predict geographic hot-spots of violence up to a week in advance using crime offense reports and 911 emergency calls, and to identify emerging citizen needs using 311 non-emergency calls for service.
* Health care: our methods were able to discover anomalous patterns of patient care with significant impacts on outcomes, and to automatically detect prostate cancer in digital pathology slides.
* Social media analytics: we used Twitter data to accurately predict civil unrest, identify emerging patterns of human rights abuses, and detect outbreaks of rare diseases.
* Other applications included detecting intruders in a computer network, customs monitoring of container shipments, and infrastructure monitoring (e.g., detection of contaminants spreading through a water distribution system).
In all of these domains, our methods demonstrated substantial improvements in the timeliness, accuracy, and specificity of pattern detection as compared to the previous state of the art.
We developed the CityScan methodology and software that were incorporated into the Chicago Police Department’s day-to-day policing operations for crime prevention through targeted deployment of patrols, and have provided the CPD with substantial value in their day to day operations. They have noted that “based upon deployment suggestions indicated in the CityScan intelligence reports, important arrests were effected, weapons were seized, and crimes were prevented.” Working with Chicago city leaders, we also applied CityScan to prediction and prevention of rodent complaints. Through advance prediction of locations where rodents are likely to occur, CityScan enables cities to more precisely target proactive rodent baiting and other prevention measures.
Key methodological contributions of our work include “fast subset scan” approaches which can efficiently identify the most interesting, anomalous, or relevant subsets of data records without an exhaustive search. This enables us to solve detection problems in milliseconds that would previously have been computationally infeasible, requiring millions of years to solve. Our fast subset scan methods can find optimal subsets subject to constraints on spatial proximity, graph connectivity, group self-similarity, or temporal consistency. They can be applied to univariate, multivariate, or multidimensional datasets, spatial or non-spatial data, including complex data such as text, images, and social media, and can track and source-trace dynamically spreading patterns. They can also be used for learning graph structure, predicting future spread of events, identifying heterogeneous treatment effects, and classifier model validation and refinement. Other methodological advances under this grant include new nonparametric methods for causal inference, prediction, and change detection in space-time data, and a fast Bayesian framework for modeling, detecting, and distinguishing between multiple event types.
Through this project, we created and developed a new graduate-level curriculum in Machine Learning and Policy (MLP) at Carnegie Mellon University. This program facilitates the widespread use of machine learning methods for the public good by incorporating machine learning throughout the public policy curriculum, exposing numerous policy students to machine learning methodologies, encouraging machine learning students to apply their research to real-world problems in the public sector, fostering research collaborations between machine learning and policy students and faculty, and training a new generation of Ph.D. students with deep expertise in both fields. Components of the program currently include: 1) the world's first joint Ph.D. program in Machine Learning and Public Policy, 2) a master's level course introducing Large Scale Data Analysis for Public Policy, 3) a Ph.D.-level Research Seminar in Machine Learning and Policy, 4) a course series, Special Topics in Machine Learning and Policy, with topics such as “Machine Learning for the Developing World", 5) a workshop and speaker series, the “Machine Learning and Social Sciences Seminar", 6) faculty recruitment, admissions, and student advising, and 7) development of a new “Data Analytics" track for our M.S. program in Public Policy and Management, which will help to involve more master's students in applied research at the intersection of machine learning and policy.
This project has supported the work of 19 graduate students in Carnegie Mellon University’s Event and Pattern Detection Laboratory. The work has been widely disseminated to a variety of methodological and applied audiences, ranging from computer scientists and statisticians to public health practitioners, law enforcement agencies, and city leaders, through direct collaborations, publication of 36 journal and conference papers, and presentation of over 50 invited talks. Papers and presentations are available on our Event and Pattern Detection Laboratory website, http://epdlab.heinz.cmu.edu.
Last Modified: 09/28/2016
Modified by: Daniel B Neill
Please report errors in award information by writing to: awardsearch@nsf.gov.