Skip to feedback

Award Abstract # 2233762
CAREER: accelerating machine learning with low dimensional structure

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: THE LELAND STANFORD JUNIOR UNIVERSITY
Initial Amendment Date: August 8, 2022
Latest Amendment Date: September 21, 2022
Award Number: 2233762
Award Instrument: Continuing Grant
Program Manager: Vladimir Pavlovic
vpavlovi@nsf.gov
 (703)292-8318
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2022
End Date: September 30, 2025 (Estimated)
Total Intended Award Amount: $550,000.00
Total Awarded Amount to Date: $454,115.00
Funds Obligated to Date: FY 2021 = $71,344.00
FY 2022 = $382,771.00
History of Investigator:
  • Madeleine Udell (Principal Investigator)
    mru8@cornell.edu
Recipient Sponsored Research Office: Stanford University
450 JANE STANFORD WAY
STANFORD
CA  US  94305-2004
(650)723-2300
Sponsor Congressional District: 16
Primary Place of Performance: Stanford University
450 Jane Stanford Way
Stanford
CA  US  94305-2004
Primary Place of Performance
Congressional District:
16
Unique Entity Identifier (UEI): HJD6G4D6TJY5
Parent UEI:
NSF Program(s): Robust Intelligence
Primary Program Source: 01002122DB NSF RESEARCH & RELATED ACTIVIT
01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7495
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Big datasets are everywhere: in science, in health, in commerce, and in government, data is becoming easier and cheaper to collect. Yet extracting value from this data is a challenge; every step requires human intervention: cleaning the data, identifying useful features, and choosing a machine learning model. The goal of this project is to develop new methods to accelerate and automate the basic machine learning (ML) workflow. Automation frees data scientists from data cleaning and parameter twiddling to concentrate on the important questions: are we solving the right problems, and do we have the right data? This project will help democratize machine learning and promote data-driven decision making by developing automated methods to clean data and to choose ML models, including open source software packages, that make these methods widely available and easy to use. The project also advances these goals by training data scientists in how to use these models and understand their potential risks.

Low dimensional structure provides the key to meeting the diverse challenges required to automate machine learning. This project relies on the central insight is that measurements of a complex object, such as a patient in a hospital, respondent on a survey, or even a ML dataset, can be well described as simple functions (or even linear functions) of an underlying low dimensional latent vector. The project develops new algorithms and software to identify low dimensional latent vectors and to use them to a) clean the data by denoising observations or imputing missing entries, b) reduce the dimensionality of feature vectors, and c) recommend better algorithms. This project will develop new techniques to identify low dimensional latent vectors from sparse observations via nonlinear (even, discontinuous) functions, with efficient algorithms and with theoretical guarantees. To enable more efficient automated machine, the project will develop methods localize similar datasets near each other in a low dimensional space, so that nearness in this space predicts similar performance of machine learning methods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fan, Jicong and Yang, Chengrun and Udell, Madeleine "Robust Non-Linear Matrix Factorization for Dictionary Learning, Denoising, and Clustering" IEEE Transactions on Signal Processing , v.69 , 2021 https://doi.org/10.1109/TSP.2021.3062988 Citation Details
Muthukumar, Ramchandran and Kouri, Drew P. and Udell, Madeleine "Randomized Sketching Algorithms for Low-Memory Dynamic Optimization" SIAM Journal on Optimization , v.31 , 2021 https://doi.org/10.1137/19M1272561 Citation Details
Sun, Yiming and Guo, Yang and Luo, Charlene and Tropp, Joel and Udell, Madeleine "Low-Rank Tucker Approximation of a Tensor from Streaming Data" SIAM Journal on Mathematics of Data Science , v.2 , 2020 https://doi.org/10.1137/19M1257718 Citation Details
Yurtsever, Alp and Tropp, Joel A. and Fercoq, Olivier and Udell, Madeleine and Cevher, Volkan "Scalable Semidefinite Programming" SIAM Journal on Mathematics of Data Science , v.3 , 2021 https://doi.org/10.1137/19M1305045 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page