Award Abstract # 2211907
RI: Medium: Foundations of Self-Supervised Learning Through the Lens of Probabilistic Generative Models

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: CARNEGIE MELLON UNIVERSITY
Initial Amendment Date: August 25, 2022
Latest Amendment Date: August 25, 2022
Award Number: 2211907
Award Instrument: Standard Grant
Program Manager: Vladimir Pavlovic
vpavlovi@nsf.gov
 (703)292-8318
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2022
End Date: September 30, 2026 (Estimated)
Total Intended Award Amount: $1,127,925.00
Total Awarded Amount to Date: $1,127,925.00
Funds Obligated to Date: FY 2022 = $1,127,925.00
History of Investigator:
  • Pradeep Ravikumar (Principal Investigator)
    pradeepr@cs.cmu.edu
  • Andrej Risteski (Co-Principal Investigator)
Recipient Sponsored Research Office: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3815
(412)268-8746
Sponsor Congressional District: 12
Primary Place of Performance: Carnegie-Mellon University
5000 Forbes Avenue
PITTSBURGH
PA  US  15213-3815
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): U3NKNFLNQ613
Parent UEI: U3NKNFLNQ613
NSF Program(s): Robust Intelligence
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7495, 7924
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Supervised learning of modern machine learning models requires very large high-quality labeled datasets. Labeling data requires very expensive human annotations, which is often too expensive for under-resourced end-users of machine learning. Unsupervised learning of machine learning models from unlabeled data has the promise to vastly increase the accessibility and inclusivity of modern machine learning. An emerging paradigm for such unsupervised learning is self-supervised learning (SSL), wherein a machine learning model is trained on tasks for which labels can be automatically generated. This approach is at the core of high-performing language and image machine learning models like BERT and DALL-E. However, despite its promise on many benchmarks across diverse domains, a lot of current methodology for developing SSL methods is opaque and heuristic, and evaluation relies on ad-hoc choices of performance metrics. The goal of this project is to build scientific and mathematical foundations of SSL, and consequently also improve its practice.

In some of the earliest work in this area, SSL was used to speed up tasks involving the learning of probabilistic models. Progressively, via a series of approximations for scalability, the outputs of SSL could no longer be rigorously tied to probabilistic model parameters, and the goal shifted to learning features that are "useful" for downstream tasks, that is representation learning. "Useful" however can often be mathematically difficult to pin down, so it is frequently not clear (even empirically, much less theoretically) what these methods learn about the data. At present, designing a well-performing SSL method entails trying many combinations of tasks and model architectures, until a particular one gives good results on the downstream tasks. This has two downsides: (i) it requires a substantial amount of trial-and-error; (ii) on a scientific level, it doesn't yield any understanding of what makes a particular task/architecture suitable, and what the features learned capture about the data distribution. This project will repair the severed tie between probabilistic models and feature learning via self-supervised models by analyzing the aspects of a deep generative model that can be recovered via self-supervised learning. Moreover, through this lens, we propose to understand the relative advantages---both statistical and algorithmic---of self-supervised learning methods over other methods for learning probabilistic models.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 14)
Aragam, Bryon and Ravikumar, Pradeep "Neuro-Causal Models" Frontiers in Artificial Intelligence and Applications , v.369 , 2023 Citation Details
Bai, Andrew and Ravikumar, Pradeep and Yeh Chih-Kuan and Lin, Neil and Hsieh, Cho-Jui "Concept Gradient: Concept-based Interpretation Without Linear Assumption" International Conference on Learning Representations (ICLR) , 2023 Citation Details
Binghui Peng, Andrej Risteski "Continual learning: a feature extraction formalization, an efficient algorithm, and barriers" Advances in neural information processing systems , 2022 Citation Details
Buchholz, Simon and Rajendran, Goutham and Rosenfeld, Elan and Aragam, Bryon and Schölkopf, Bernhard and Ravikumar, Pradeep "Learning Linear Causal Representations from Interventions under General Nonlinear Mixing" , 2023 Citation Details
Chen, Tianyu and Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep "iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models" , 2023 Citation Details
Chen, Yining and Rosenfeld, Elan and Sellke, Mark and Ma, Tengyu and Risteski, Andrej "Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments" Advances in neural information processing systems , 2022 Citation Details
Kivva, Bohdan and Rajendran, Goutham and Ravikumar, Pradeep and Aragam, Bryon "Identifiability of deep generative models without auxiliary information" Advances in Neural Information Processing Systems (NeurIPS) , 2023 Citation Details
Lee, Holden and Pabbaraju, Chirag and Sevekari, Anish Prasad and Risteski, Andrej "Pitfalls of Gaussians as a noise distribution in NCE" International Conference on Learning Representations , 2023 Citation Details
Liu, Bingbin and Hsu, Daniel J. and Ravikumar, Pradeep and Risteski, Andrej "Masked Prediction: A Parameter Identifiability View" Advances in neural information processing systems , 2022 Citation Details
Li, Yuchen and Kirchmayer, Alexandre and Mehta, Aashay and Qin, Yilong and Dadachev, Boris and Papineni, Kishore and Kumar, Sanjiv and Risteski, Andrej "Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines" , 2024 Citation Details
Pukdee, Rattana and Sam, Dylan and Balcan, Maria-Florina and Ravikumar, Pradeep "LABEL PROPAGATION WITH WEAK SUPERVISION" International Conference on Learning Representations (ICLR) , 2023 Citation Details
(Showing: 1 - 10 of 14)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page