
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 8, 2022 |
Latest Amendment Date: | September 21, 2022 |
Award Number: | 2233762 |
Award Instrument: | Continuing Grant |
Program Manager: |
Vladimir Pavlovic
vpavlovi@nsf.gov (703)292-8318 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | July 1, 2022 |
End Date: | September 30, 2025 (Estimated) |
Total Intended Award Amount: | $550,000.00 |
Total Awarded Amount to Date: | $454,115.00 |
Funds Obligated to Date: |
FY 2022 = $382,771.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
450 JANE STANFORD WAY STANFORD CA US 94305-2004 (650)723-2300 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
450 Jane Stanford Way Stanford CA US 94305-2004 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Robust Intelligence |
Primary Program Source: |
01002223DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Big datasets are everywhere: in science, in health, in commerce, and in government, data is becoming easier and cheaper to collect. Yet extracting value from this data is a challenge; every step requires human intervention: cleaning the data, identifying useful features, and choosing a machine learning model. The goal of this project is to develop new methods to accelerate and automate the basic machine learning (ML) workflow. Automation frees data scientists from data cleaning and parameter twiddling to concentrate on the important questions: are we solving the right problems, and do we have the right data? This project will help democratize machine learning and promote data-driven decision making by developing automated methods to clean data and to choose ML models, including open source software packages, that make these methods widely available and easy to use. The project also advances these goals by training data scientists in how to use these models and understand their potential risks.
Low dimensional structure provides the key to meeting the diverse challenges required to automate machine learning. This project relies on the central insight is that measurements of a complex object, such as a patient in a hospital, respondent on a survey, or even a ML dataset, can be well described as simple functions (or even linear functions) of an underlying low dimensional latent vector. The project develops new algorithms and software to identify low dimensional latent vectors and to use them to a) clean the data by denoising observations or imputing missing entries, b) reduce the dimensionality of feature vectors, and c) recommend better algorithms. This project will develop new techniques to identify low dimensional latent vectors from sparse observations via nonlinear (even, discontinuous) functions, with efficient algorithms and with theoretical guarantees. To enable more efficient automated machine, the project will develop methods localize similar datasets near each other in a low dimensional space, so that nearness in this space predicts similar performance of machine learning methods.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
Please report errors in award information by writing to: awardsearch@nsf.gov.