
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 26, 2019 |
Latest Amendment Date: | June 30, 2020 |
Award Number: | 1908104 |
Award Instrument: | Continuing Grant |
Program Manager: |
Wei Ding
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2019 |
End Date: | June 30, 2021 (Estimated) |
Total Intended Award Amount: | $500,000.00 |
Total Awarded Amount to Date: | $500,000.00 |
Funds Obligated to Date: |
FY 2020 = $0.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
2221 UNIVERSITY AVE SE STE 100 MINNEAPOLIS MN US 55414-3074 (612)624-5599 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
200 Oak St SE Minneapolis MN US 55455-2070 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01002021DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Stochastic algorithms such as stochastic gradient descent (SGD) are the workhorse of modern data science. Such algorithms have been playing an important role in the success of deep learning. In spite of such empirical success, the behavior of SGD for challenging non-convex optimization problems as encountered in deep learning is shrouded in mystery. There is limited understanding of how SGD navigates non-convex loss landscapes, how bad local minima are avoided, and how deep models learned using SGD generalize well on future data. The project focuses on gaining clarity of understanding of SGD dynamics and generalization for non-convex problems arising in the context of deep learning. The project also uses the improved understanding to develop prinipled approches to adaptively use validation sets to choose hyper-parameters and avoid overfitting. The insights gained from the technical advances are applied to the challenging scientific problem of sub-seasonal to seasonal (S2S) weather forecasting, which focuses on forecasting weather on a few weeks to few months time-frame. Advances in S2S forecasting is critically important to a wide variety of application domains including water resource management, agriculture, energy, aviation, maritime planning, and emergency planning. The project also engages the broader data science community, incorporating the gained insights for curricular enrichment, and broadening participation from underepresented groups.
The project studys SGD dynamics with primary focus on the over-parameterized setting, i.e., where the number of samples is smaller than the number of parameters, which is typical for deep learning. The dynamics is carefully studied based on two key matrices: the Hessian of the non-convex loss function and the covariance matrix of the stochastic gradients, their eigen-spectra, and the overlap between their principal subspaces. Although the SGD dynamics happen in a high-dimensional space, the principal subspaces of these matrices can be low-dimensional. Tools from high-dimensional geometry and associated stochastic processes are utilized to characterize such low dimensional dynamics in high-dimensional spaces. Principled approaches to explain the intriguing generalization behavior of deep learning models trained with SGD are also developed based on the properties of these matrices. Further, differential privacy based mechanisms are developed for adaptively using validation sets for choosing hyper-parameters and avoiding over-fitting in deep learning.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Please report errors in award information by writing to: awardsearch@nsf.gov.