NSF Award Search: Award # 2143754

Award Abstract # 2143754

CAREER: Random Neural Nets and Random Matrix Products

NSF Org:	DMS Division Of Mathematical Sciences
Recipient:	THE TRUSTEES OF PRINCETON UNIVERSITY
Initial Amendment Date:	December 13, 2021
Latest Amendment Date:	June 29, 2024
Award Number:	2143754
Award Instrument:	Continuing Grant
Program Manager:	Tomek Bartoszynski tbartosz@nsf.gov (703)292-4885 DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences
Start Date:	June 1, 2022
End Date:	May 31, 2027 (Estimated)
Total Intended Award Amount:	$577,242.00
Total Awarded Amount to Date:	$489,112.00
Funds Obligated to Date:	FY 2022 = $210,614.00 FY 2023 = $138,005.00 FY 2024 = $140,493.00
History of Investigator:	Boris Hanin (Principal Investigator)
Recipient Sponsored Research Office:	Princeton University 1 NASSAU HALL PRINCETON NJ US 08544-2001 (609)258-3090
Sponsor Congressional District:	12
Primary Place of Performance:	Princeton University 326 Sherrerd Hall Princeton NJ US 08544-2020
Primary Place of Performance Congressional District:	12
Unique Entity Identifier (UEI):	NJ1YPQXQG7U5
Parent UEI:
NSF Program(s):	PROBABILITY, Networking Technology and Syst
Primary Program Source:	01002223DB NSF RESEARCH & RELATED ACTIVIT 01002324DB NSF RESEARCH & RELATED ACTIVIT 01002425DB NSF RESEARCH & RELATED ACTIVIT 01002526DB NSF RESEARCH & RELATED ACTIVIT 01002627DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	079Z, 1045, 7556
Program Element Code(s):	126300, 736300
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.049, 47.070

ABSTRACT

We live in an era of big data and inexpensive computation. Vast stores of information can efficiently be analyzed for underlying patterns by machine learning algorithms, leading to remarkable progress in applications ranging from self-driving cars to automatic drug discovery and machine translation. Underpinning many of these exciting practical developments is a class of computational models called neural networks. Originally developed in the 1940's and 1950's, the neural nets used today are as complex as they are powerful. The purpose of this project is to develop a range of principled techniques for understanding key aspects of how neural networks work in practice and how to make them better. The approach taken by this project is probabilistic and statistical in nature. Just as the ideal gas law accurately describes the large-scale properties of a gas directly through pressure, volume, and temperature without the need specify the state of each individual gas molecule, this project will explore and identify emergent statistical behaviors of large neural networks that provably explain many of their key properties observed in practice. The project will also provide research training and educational opportunities through organization of summer schools in machine learning for graduate students.

At a high level, a neural network is a family of functions given by composing affine transformations with elementary non-linear operations. The simplest important kind of neural networks are roughly described by two parameters called depth and width. The former is the dimension of the spaces on which the affine transformations act and the latter is the number of compositions. The technical heart of this project is to understand the statistical behavior of such networks when the affine transformations are chosen at random. The starting point is an analytically tractable regime in which the network width is sent to infinity at fixed depth. In this infinite width limit, random networks converge to Gaussian processes and optimization of network parameters from their randomly chosen starting points reduces to a kernel method. Unfortunately, this concise description cannot capture what is perhaps the most important empirical property of neural networks, namely their ability to learn data-dependent features. Understanding how feature learning occurs is at the core of this project and requires new probabilistic and analytic tools for studying random neural networks at finite width. The basic idea is to perform perturbation theory around the infinite width limit, treating the reciprocal of the network width as a small parameter. The goal is then to obtain, to all orders in this reciprocal, the expressions for joint distribution of the values and derivatives (with respect to both model inputs and model parameters) of a random neural network. Such formulas have practical consequences for understanding the numerical stability of neural network training, suggesting principled settings for optimization hyper-parameters, and quantifying feature learning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bordleon, Blake and Noci, Lorenzo and Li, Mufan and Hanin, Boris and Pehlevan, Cengiz "Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit" International Conference on Learning Representations , 2024 Citation Details

Hanin, B and Jeong, R. and Rolnick, D "Deep ReLU Networks Preserve Expected Length" Advances in neural information processing systems , 2021 Citation Details

Hanin, Boris "Random neural networks in the infinite width limit as Gaussian processes" The Annals of Applied Probability , v.33 , 2023 https://doi.org/10.1214/23-AAP1933 Citation Details

Hanin, Boris and Rolnick, David and Jeong, Ryan "Deep ReLU Networks Preserve Expected Length" International Conference on Learning Representations , 2022 Citation Details

Hanin, Boris and Zlokapa, Alexander "Bayesian interpolation with deep linear networks" Proceedings of the National Academy of Sciences , v.120 , 2023 https://doi.org/10.1073/pnas.2301345120 Citation Details

Iyer, Gaurav and Hanin, Boris and Rolnick, David "Maximal Initial Learning Rates in Deep ReLU Networks" Proceedings of Machine Learning Research , 2023 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error