
NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | December 13, 2021 |
Latest Amendment Date: | June 29, 2024 |
Award Number: | 2143754 |
Award Instrument: | Continuing Grant |
Program Manager: |
Tomek Bartoszynski
tbartosz@nsf.gov (703)292-4885 DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences |
Start Date: | June 1, 2022 |
End Date: | May 31, 2027 (Estimated) |
Total Intended Award Amount: | $577,242.00 |
Total Awarded Amount to Date: | $489,112.00 |
Funds Obligated to Date: |
FY 2023 = $138,005.00 FY 2024 = $140,493.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1 NASSAU HALL PRINCETON NJ US 08544-2001 (609)258-3090 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
326 Sherrerd Hall Princeton NJ US 08544-2020 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
PROBABILITY, Networking Technology and Syst |
Primary Program Source: |
01002324DB NSF RESEARCH & RELATED ACTIVIT 01002425DB NSF RESEARCH & RELATED ACTIVIT 01002526DB NSF RESEARCH & RELATED ACTIVIT 01002627DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049, 47.070 |
ABSTRACT
We live in an era of big data and inexpensive computation. Vast stores of information can efficiently be analyzed for underlying patterns by machine learning algorithms, leading to remarkable progress in applications ranging from self-driving cars to automatic drug discovery and machine translation. Underpinning many of these exciting practical developments is a class of computational models called neural networks. Originally developed in the 1940's and 1950's, the neural nets used today are as complex as they are powerful. The purpose of this project is to develop a range of principled techniques for understanding key aspects of how neural networks work in practice and how to make them better. The approach taken by this project is probabilistic and statistical in nature. Just as the ideal gas law accurately describes the large-scale properties of a gas directly through pressure, volume, and temperature without the need specify the state of each individual gas molecule, this project will explore and identify emergent statistical behaviors of large neural networks that provably explain many of their key properties observed in practice. The project will also provide research training and educational opportunities through organization of summer schools in machine learning for graduate students.
At a high level, a neural network is a family of functions given by composing affine transformations with elementary non-linear operations. The simplest important kind of neural networks are roughly described by two parameters called depth and width. The former is the dimension of the spaces on which the affine transformations act and the latter is the number of compositions. The technical heart of this project is to understand the statistical behavior of such networks when the affine transformations are chosen at random. The starting point is an analytically tractable regime in which the network width is sent to infinity at fixed depth. In this infinite width limit, random networks converge to Gaussian processes and optimization of network parameters from their randomly chosen starting points reduces to a kernel method. Unfortunately, this concise description cannot capture what is perhaps the most important empirical property of neural networks, namely their ability to learn data-dependent features. Understanding how feature learning occurs is at the core of this project and requires new probabilistic and analytic tools for studying random neural networks at finite width. The basic idea is to perform perturbation theory around the infinite width limit, treating the reciprocal of the network width as a small parameter. The goal is then to obtain, to all orders in this reciprocal, the expressions for joint distribution of the values and derivatives (with respect to both model inputs and model parameters) of a random neural network. Such formulas have practical consequences for understanding the numerical stability of neural network training, suggesting principled settings for optimization hyper-parameters, and quantifying feature learning.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
Please report errors in award information by writing to: awardsearch@nsf.gov.