
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | June 22, 2020 |
Latest Amendment Date: | June 22, 2020 |
Award Number: | 2007814 |
Award Instrument: | Standard Grant |
Program Manager: |
James Fowler
jafowler@nsf.gov (703)292-8910 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | July 1, 2020 |
End Date: | June 30, 2024 (Estimated) |
Total Intended Award Amount: | $249,990.00 |
Total Awarded Amount to Date: | $249,990.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
633 CLARK ST EVANSTON IL US 60208-0001 (312)503-7955 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
2145 Sheridan Rd Evanston IL US 60208-0834 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Comm & Information Foundations |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Systems prevalent in modern society can be characterized by complex networks of interconnected components that generate massive amounts of data. The ability to make timely inferences using these data presents unprecedented opportunities to solve major societal problems. For example, advances in wearable technology are transforming the delivery of personalized healthcare and wellness programs. More broadly, wearables naturally create sensor networks over populations and the data from these networks can be harnessed to detect and/or prevent diseases, crimes or environmental hazards. Inference from such data can be naturally accomplished using graphical models. Unfortunately, existing technology for graphical models requires stringent assumptions that are seldom satisfied in modern applications. The goal of this project is to address these shortcomings by developing new computational methods that automatically infer the topology of a graphical model from high-dimensional data, identify and/or correct outliers and anomalies, and solve the estimation problems simultaneously. Furthermore, the proposed research will lead to innovative teaching material defining modern data science curricula and develop a diverse cadre of Ph.D. students with skills at the interface of discrete optimization, continuous optimization, and statistics.
Inference problems with spurious data and unknown network topologies can be modeled as large-scale constrained mixed-integer convex optimization problems. To address the challenges posed by the presence of the combinatorial constraints, this project employs a combination of two key ideas. The first idea is to decompose the problem into progressively small problems, that can be solved in a decentralized and parallel fashion, by leveraging the Markov property inherent in graphical models. The second idea is the convexification of the combinatorial constraints, to diminish or prevent altogether the loss in quality from the decomposition of the problem. Unlike typical decomposition methods such as Lagrangian relaxation, which can lead to large duality gaps, this project will develop novel techniques based on convexification and Fenchel duality. In particular, the resulting method will account for the combinatorial restrictions and the nonlinear loss function concurrently, ultimately resulting in small or no duality gaps. The successful completion of the project will lead to significant advances in inference with spatio-temporal data, interpretable prediction, and identification of causal relationships.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Prevalent systems in contemporary society can be characterized by complex networks of interconnected components that generate massive amounts of data (e.g., wireless networks of sensors or wearables, and gene regulatory networks). The ability to make timely inferences using these data presents unprecedented opportunities to tackle major societal problems, ranging from managing population health to understanding complex diseases. Graphical models are a natural fit to use in the context of inference problems over networks. However, existing graphical models and methods have inherent limitations that preclude their use in modern applications: the exact methods do not scale to high-dimensional data, they often assume that the structure of the graphical model is known a priori and that the data do not contain outliers or anomalies. Due to scalability concerns associated with solving large-scale combinatorial inference problems to optimality, heuristic methods are often used in practice. Unfortunately, the statistical properties of the solutions of such heuristics are far from clear, resulting in a gap between theory and computation. In this project, we close we close this knowledge gap. The proposed work differs from the status quo in that we develop exact and scalable computational methods that automatically infer the topology of a graphical model from high-dimensional data, identify and/or correct anomalies and solve the estimation problems simultaneously. To do so, we model the inference problems with graphical models as constrained mixed-integer convex optimization problems.
The contributions of this project are two-fold. First, we identify a broad class of learning problems with graphical models that, contrary to usual expectations, can in fact be solved efficiently to optimality. The class of efficiently-solvable problems relies on exploiting critical structural properties of graphical models that arise with time series data and Besag-York models (commonly used to model spread of epidemics), and the algorithms extend to problems with the presence of difficult combinatorial constraints such as outliers or interpretability. Naturally, in this case, the exact solution of the problems provides improved statistical performance without additional computational costs. Second, for problems that are not polynomial-time solvable, we propose decomposition methods based on a divide a conquer approach. At a high level, the proposed approach breaks a complex learning problems into simpler structures which can be tackled efficiently, and then combines the solutions obtained by applying the proposed algorithms to the simpler structures. We show that the proposed method outperforms usual existing statistical techniques, and is considerably faster than exact methods relying on off-the-shelf mixed-integer optimization solvers.
Last Modified: 08/21/2024
Modified by: Simge Kucukyavuz
Please report errors in award information by writing to: awardsearch@nsf.gov.