
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | August 30, 2016 |
Latest Amendment Date: | August 30, 2016 |
Award Number: | 1637585 |
Award Instrument: | Standard Grant |
Program Manager: |
A. Funda Ergun
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | August 31, 2022 (Estimated) |
Total Intended Award Amount: | $399,939.00 |
Total Awarded Amount to Date: | $399,939.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
633 CLARK ST EVANSTON IL US 60208-0001 (312)503-7955 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
2145 Sheridan Rd. Evanston IL US 60208-3118 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Algorithms in the Field |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Statistical models provide a powerful means of quantifying uncertainty, modeling prior beliefs, and describing complex dependencies in data. The process of using a model to answer specific questions, such as inferring the state of several random variables given evidence observed about others, is called probabilistic inference. Probabilistic graphical models, a type of statistical model, are often used in diverse applications such as medical diagnosis, understanding protein and gene regulatory networks, computer vision, and language understanding. On account of the central role played by probabilistic graphical models in a wide range of automated reasoning applications, designing efficient algorithms for probabilistic inference is a fundamental problem in artificial intelligence and machine learning.
Probabilistic inference in many of these applications corresponds to a complex combinatorial optimization problem that at first glance appears to be extremely difficult to solve. However, practitioners have made significant strides in designing heuristic algorithms to perform real-world inference accurately and efficiently. This project focuses on bridging the gap between theory and practice for probabilistic inference problems in large-scale machine learning systems. The PIs will identify structural properties and methods of analysis that differentiate real-world instances from worst-case instances used to show NP-hardness, and will design efficient algorithms with provable guarantees that would apply to most real-world instances. The project will also study why heuristics like linear programming and other convex relaxations are so successful on real-world instances. The efficient algorithms for probabilistic inference developed as part of this project have the potential to be transformative in machine learning, statistics, and more applied areas like computer vision, social networks and computational biology. To help disseminate the research and foster new collaborations, a series of workshops will be organized bringing together the theoretical computer science and machine learning communities. Additionally, undergraduate curricula will be developed that use machine learning to introduce students to concepts in theoretical computer science.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Probabilistic models provide a powerful means of quantifying uncertainty, modeling prior beliefs, and describing complex dependencies in data. These models are frequently used to make predictions in settings with limited training data such as in health care (medical diagnosis from symptoms), biology (inferring protein regulatory networks), computer vision (stereopsis), and natural language processing (parsing and text generation). Using these models requires performing probabilistic inference, which takes evidence about observed variables to draw conclusions about unobserved variables. Inference in these applications corresponds to a complex combinatorial optimization problem that at first glance appears to be extremely difficult to solve. However, practitioners have made significant strides in designing heuristic algorithms to perform real-world inference accurately and efficiently. This project focused on algorithms and theory for probabilistic inference problems in large-scale machine learning systems.
A major contribution of this project was bridging the gap between theory and practice for finding the most probable assignment of the unobserved variables (MAP inference) in a special case of probabilistic models (Potts models). Heuristics based on iterative techniques (alpha-expansion) and linear programming relaxations have been shown empirically to return solutions that are either globally optimal, or close to globally optimal. PIs David Sontag and Aravindan Vijayaraghavan with PhD students Hunter Lang (MIT) and Aravind Reddy (Northwestern) showed through both theoretical and empirical results published in a series of works (AISTATS 2018, AISTATS 2019, ICML 2021, AISTATS 2021) that variants of a natural notion called stability could explain the tractability of these instances in practice.
Specifically, for the alpha-expansion algorithm, we proved (in Lang et al., ICML 2021) that it always returns a globally optimal assignment for Potts models, with a catch: the returned assignment is only guaranteed to be optimal for an instance within a small perturbation of the original problem instance. Intuitively, we expect that for robust models used in applications, small perturbations to their parameters should have similar optimal assignments, which would imply that alpha-expansion provides near-optimal assignments for the original models. Moreover, for a linear programming relaxation, we proved that it is close to integral when the instance is close to stable or satisfies a condition called block stability (AISTATS 2019, AISTATS 2021). We then designed algorithms that can certify whether models satisfy these conditions. Using our new methods for certifying stability, we were able to demonstrate that several computer vision models indeed satisfy these stability conditions. Taken together, these results give a cohesive explanation for the good performance of the above heuristics in practice.
Our results also suggested a monotonicity principle for algorithm design for the linear programming relaxation: we showed that the relaxation performs well when the input is generated as a simpler tractable part of the model and remaining factors are random but slightly biased toward the ground truth.
The project also led to theory-inspired improvements for emerging paradigms in machine learning like weak supervision, where one seeks to learn from imperfect labels that domain experts provide (e.g., that any review with the word ?great? should be labeled as positive sentiment). Under the assumption of conditional independence between the weak labels and the features, in joint work of the PIs with MIT PhD student Hunter Lang (published in NeurIPS 2022), we proved that there is a trade-off between the number of data points ?covered? by the weak labels and their noise rates. We then showed that we could use the popular ?cut statistic? heuristic to rank data points by a measure of the confidence in their weak labels; finally, we train the downstream classifier on the most confident of these. Surprisingly, we found that this technique improved the performance of every single weak supervision method previously published, often by large amounts.
Other research highlights from this project include new theory and algorithms for clustering Euclidean instances that are stable (Dutta et al., NeurIPS ?17), new algorithms with provable guarantees for learning depth-2 neural networks (Awasthi et al., NeurIPS ?21), improved algorithms using ideas from co-training for prompt-based learning with large language models and unlabeled data (Lang et al., ICML ?22), as well as new algorithms and provable guarantees for human-AI interaction (Mozannar et al., AAAI '22) and self-supervised learning tasks that arise in healthcare (Agrawal et al., AISTATS ?22).
The PIs also organized activities and workshops with the goal of reaching out to the broader scientific community interested in topics related to the project. PI Vijayaraghavan co-organized (along with Aleksander Madry and Daniel Hsu) the FOCS 2021 workshop on New Directions in Machine Learning. PI Vijayaraghavan also co-organized several day-long theory workshops at Northwestern University that got broad attendance from theory faculty, postdocs and students in many mid-west universities in and near Chicago. Two of these workshops were closely related to the topic of this project. These workshops were open to the public, and videos are publicly accessible on the website.
Last Modified: 01/20/2023
Modified by: Aravindan Vijayaraghavan
Please report errors in award information by writing to: awardsearch@nsf.gov.