Award Abstract # 0953330
CAREER: Machine Learning and Event Detection for the Public Good

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: CARNEGIE MELLON UNIVERSITY
Initial Amendment Date: March 31, 2010
Latest Amendment Date: March 31, 2010
Award Number: 0953330
Award Instrument: Standard Grant
Program Manager: Maria Zemankova
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2010
End Date: June 30, 2016 (Estimated)
Total Intended Award Amount: $529,962.00
Total Awarded Amount to Date: $529,962.00
Funds Obligated to Date: FY 2010 = $529,962.00
History of Investigator:
  • Daniel Neill (Principal Investigator)
    daniel.neill@nyu.edu
Recipient Sponsored Research Office: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3815
(412)268-8746
Sponsor Congressional District: 12
Primary Place of Performance: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3815
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): U3NKNFLNQ613
Parent UEI: U3NKNFLNQ613
NSF Program(s): Info Integration & Informatics,
SciSIP-Sci of Sci Innov Policy
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 0000, 1045, 1187, 7364, 7626, 9215, HPCC, OTHR
Program Element Code(s): 736400, 762600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The goal of this research is to create and explore novel methods for detection of emerging events in massive, complex real-world datasets. The approach consists of new algorithms to efficiently and exactly find the most anomalous subsets of a large, high-dimensional dataset, as well as methodological advances to incorporate incremental model learning from user feedback into event detection, incorporate society-scale data from emerging, transformative technologies such as cellular phones and user-generated web content, and augment event detection by creating methods and tools for event characterization, explanation, visualization, investigation and response.

The experimental research is integrated with a multi-pronged educational initiative to incorporate machine learning into the public policy curriculum through development of courses and seminars, workshops in machine learning and policy research and education, and establishment of a new Joint Ph.D. Program in Machine Learning and Policy. The results of this project will be incorporated into deployed event surveillance systems and applied to the public health, law enforcement, and health care domains, enabling more timely and accurate detection of emerging outbreaks of disease, prediction of emerging hot-spots of violent crime, and identification of anomalous patterns of patient care. Project results, including publications, software, and datasets, will be disseminated via project web site (http://www.cs.cmu.edu/~neill/CAREER).

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 39)
Brad J. Bushman, Katherine Newman, Sandra L. Calvert, Geraldine Downey, Mark Dredze, Michael Gottfredson, Nina G. Jablonski, Ann S. Masten, Calvin Morrill, Daniel B. Neill, Daniel Romer, and Daniel W. Webster "Youth violence: what we know and what we need to know" American Psychologist , v.71 , 2016 , p.17
Daniel B. Neill "Using artificial intelligence to improve hospital inpatient care" IEEE Intelligent Systems , v.28 , 2013 , p.92
Daniel B. Neill and Tarun Kumar "Fast multidimensional subset scan for outbreak detection and characterization" Online Journal of Public Health Informatics , v.5 , 2013 , p.156
Daniel B. Neill and Tarun Kumar "Fast multidimensional subset scan for outbreak detection andcharacterization" Online Journal of Public Health Informatics , v.5 , 2013 , p.156
Daniel B. Neill, Edward McFowland III, and Huanian Zheng "Fast subset scan for multivariate event detection" Statistics in Medicine , v.32 , 2013 , p.2185 10.1002/sim.5675
Daniel B. Neill, Edward McFowland III, and Huanian Zheng "Fast subset scan for multivariate event detection" Statistics in Medicine , v.32 , 2013 , p.2185-2208 10.1002/sim.5675
Daniel Gartner, Rainer Kolisch, Daniel B. Neill, and Rema Padman "Machine learning approaches for early DRG classification and resource allocation" INFORMS Journal of Computing , v.27 , 2015 , p.718
DB Neill "Fast Bayesian scan statistics for multivariate event detection and visualization" Statistics in Medicine , v.30 , 2011 , p.455 10.1002/sim.388
DB Neill and Y Liu "Generalized fast subset sums for Bayesian detection and visualization" Emerging Health Threats Journal , v.4 , 2011 , p.s43
DB Neill, E McFowland III, and H Zheng "Fast subset scan for multivariate spatial biosurveillance" Emerging Health Threats Journal , v.4 , 2011 , p.s42
D Oliveira, DB Neill, JH Garrett Jr, L Soibelman "Detection of patterns in water distribution pipe breakage using spatial scan statistics for point events in a physical network" Journal of Computing in Civil Engineering , v.25 , 2011 , p.21
(Showing: 1 - 10 of 39)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project developed a variety of novel methods for accurate and computationally efficient detection of emerging events and other patterns in massive, complex datasets.  We translated these methodological advances into real-world systems that can benefit public health, safety, and security, in areas including:

* Disease surveillance: we used electronically available public health data such as hospital visits and medication sales to automatically identify and characterize emerging disease outbreaks in their very early stages.

* Law enforcement and urban analytics: we were able to accurately predict geographic hot-spots of violence up to a week in advance using crime offense reports and 911 emergency calls, and to identify emerging citizen needs using 311 non-emergency calls for service.

* Health care: our methods were able to discover anomalous patterns of patient care with significant impacts on outcomes, and to automatically detect prostate cancer in digital pathology slides.

* Social media analytics: we used Twitter data to accurately predict civil unrest, identify emerging patterns of human rights abuses, and detect outbreaks of rare diseases.

* Other applications included detecting intruders in a computer network, customs monitoring of container shipments, and infrastructure monitoring (e.g., detection of contaminants spreading through a water distribution system).

In all of these domains, our methods demonstrated substantial improvements in the timeliness, accuracy, and specificity of pattern detection as compared to the previous state of the art.

We developed the CityScan methodology and software that were incorporated into the Chicago Police Department’s day-to-day policing operations for crime prevention through targeted deployment of patrols, and have provided the CPD with substantial value in their day to day operations. They have noted that “based upon deployment suggestions indicated in the CityScan intelligence reports, important arrests were effected, weapons were seized, and crimes were prevented.” Working with Chicago city leaders, we also applied CityScan to prediction and prevention of rodent complaints. Through advance prediction of locations where rodents are likely to occur, CityScan enables cities to more precisely target proactive rodent baiting and other prevention measures.

Key methodological contributions of our work include “fast subset scan” approaches which can efficiently identify the most interesting, anomalous, or relevant subsets of data records without an exhaustive search.  This enables us to solve detection problems in milliseconds that would previously have been computationally infeasible, requiring millions of years to solve.  Our fast subset scan methods can find optimal subsets subject to constraints on spatial proximity, graph connectivity, group self-similarity, or temporal consistency. They can be applied to univariate, multivariate, or multidimensional datasets, spatial or non-spatial data, including complex data such as text, images, and social media, and can track and source-trace dynamically spreading patterns. They can also be used for learning graph structure, predicting future spread of events, identifying heterogeneous treatment effects, and classifier model validation and refinement. Other methodological advances under this grant include new nonparametric methods for causal inference, prediction, and change detection in space-time data, and a fast Bayesian framework for modeling, detecting, and distinguishing between multiple event types.

Through this project, we created and developed a new graduate-level curriculum in Machine Learning and Policy (MLP) at Carnegie Mellon University. This program facilitates the widespread use of machine learning methods for the public good by incorporating machine learning throughout the public policy curriculum, exposing numerous policy students to machine learning methodologies, encouraging machine learning students to apply their research to real-world problems in the public sector, fostering research collaborations between machine learning and policy students and faculty, and training a new generation of Ph.D. students with deep expertise in both fields. Components of the program currently include: 1) the world's first joint Ph.D. program in Machine Learning and Public Policy, 2) a master's level course introducing Large Scale Data Analysis for Public Policy, 3) a Ph.D.-level Research Seminar in Machine Learning and Policy, 4) a course series, Special Topics in Machine Learning and Policy, with topics such as “Machine Learning for the Developing World", 5) a workshop and speaker series, the “Machine Learning and Social Sciences Seminar", 6) faculty recruitment, admissions, and student advising, and 7) development of a new “Data Analytics" track for our M.S. program in Public Policy and Management, which will help to involve more master's students in applied research at the intersection of machine learning and policy.

This project has supported the work of 19 graduate students in Carnegie Mellon University’s Event and Pattern Detection Laboratory.  The work has been widely disseminated to a variety of methodological and applied audiences, ranging from computer scientists and statisticians to public health practitioners, law enforcement agencies, and city leaders, through direct collaborations, publication of 36 journal and conference papers, and presentation of over 50 invited talks. Papers and presentations are available on our Event and Pattern Detection Laboratory website, http://epdlab.heinz.cmu.edu.


Last Modified: 09/28/2016
Modified by: Daniel B Neill

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page