NSF Award Search: Award # 1846088

Award Abstract # 1846088

CAREER: Modern nonconvex optimization for machine learning: foundations of geometric and scalable techniques

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Initial Amendment Date:	March 11, 2019
Latest Amendment Date:	June 24, 2022
Award Number:	1846088
Award Instrument:	Continuing Grant
Program Manager:	Vladimir Pavlovic vpavlovi@nsf.gov (703)292-8318 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	March 15, 2019
End Date:	February 29, 2024 (Estimated)
Total Intended Award Amount:	$500,000.00
Total Awarded Amount to Date:	$500,000.00
Funds Obligated to Date:	FY 2019 = $101,427.00 FY 2020 = $92,764.00 FY 2021 = $99,219.00 FY 2022 = $206,590.00
History of Investigator:	Suvrit Sra (Principal Investigator) suvrit@mit.edu
Recipient Sponsored Research Office:	Massachusetts Institute of Technology 77 MASSACHUSETTS AVE CAMBRIDGE MA US 02139-4301 (617)253-1000
Sponsor Congressional District:	07
Primary Place of Performance:	Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA US 02139-4307
Primary Place of Performance Congressional District:	07
Unique Entity Identifier (UEI):	E2NYLCDML6V1
Parent UEI:	E2NYLCDML6V1
NSF Program(s):	Robust Intelligence
Primary Program Source:	01002223DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7495, 1045
Program Element Code(s):	749500
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Mathematical optimization lies at the heart of machine learning (ML) and artificial intelligence (AI) algorithms. Key challenges herein are to decide what criteria to optimize, and what algorithms to use for performing the optimization. These challenges underlie the motivation for the present project. More specifically, this project seeks to make progress on three fundamental topics in optimization for ML: (i) theoretical foundations for a rich new class of optimization problems that can be solved efficiently (i.e., in a computationally tractable manner); (ii) a set of algorithms that apply to large-scale optimization problems in machine learning (e.g., for accelerating the training of neural networks); and (iii) theory that seeks to understand and explain why do neural networks succeed in practice. By focusing on topics of foundational importance, this project should spur a variety of followup research that deepends the connection of ML and AI with both mathematics and the applied sciences. More broadly, the this project may have a lasting societal impact too, primarily because of (i) its focus on optimization particularly relevant to ML and AI; (2) the non-traditional application domains it connects with (e.g., synthetic biology); and (3) because the investigator is in an environment that fosters such impact (namely, the Institute for Data, Systems, and Society (IDSS), a cross-disciplinary institute at MIT whose mission to drive solutions to problems of societal relevance). Finally, the project has an education centric focus; it involves intellectual and professional development of students, as well as development of curricular material based on the topics of research covered herein.

This project lays out an ambitious agenda to develop foundational theory for geometric optimization, large-scale nonconvex optimization, and deep neural networks. The research on geometric optimization (which is a powerful new subclass of nonconvex optimization), is originally motivated by applications in ML and statistics; however, it stands to have a broader impact across all disciplines that consume optimization. The investigator seeks to develop a theory of polynomial time optimization for a class strictly larger than usual convex optimization problems, and thereby endow practitioners with new polynomial time tools and models; if successful, this investigation could open an entire subarea of research and applications. Beyond geometric optimization, the project also focuses on large-scale nonconvex optimization and on the theory of optimization and generalization for deep learning. Within these topics, the project will address key theoretical challenges, develop scalable new algorithms that could greatly speed up neural network training, and also make progress that reduces the gap between the theory and real-world practice of nonconvex optimization.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 22)

Show All

Ahn, Kwangjun and Sra, Suvrit "From Nesterovs Estimate Sequence to Riemannian Acceleration" Proceedings of Machine Learning Research , v.125 , 2020 Citation Details

Ahn, Kwangjun and Sra, Suvrit "Understanding Nesterov's Acceleration via Proximal Point Method" Symposium on Simplicity in Algorithms (SOSA) , 2022 https://doi.org/10.1137/1.9781611977066.9 Citation Details

Ahn, Kwangjun and Yun, Chulhee and Sra, Suvrit "SGD with shuffling: optimal rates without component convexity and large epoch requirements" 34th Conference on Neural Information Processing Systems (NeurIPS 2020) , 2020 Citation Details

Ahn, Kwangjun and Zhang, Jingzhao and Sra, Suvrit "Understanding the unstable convergence of gradient descent" , 2022 Citation Details

Cheng, Xiang and Zhang, Jingzhao and Sra, Suvrit "Efficient Sampling on Riemannian Manifolds via Langevin MCMC" , 2022 Citation Details

Hosseini, Reshad and Sra, Suvrit "An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization" Mathematical Programming , v.181 , 2020 10.1007/s10107-019-01381-4 Citation Details

Jin, Jikai and Sra, Suvrit "Understanding Riemannian acceleration via a proximal extragradient framework" , 2022 Citation Details

Mania, Horia and Jadbabaie, Ali and Shah, Devavrat and Sra, Suvrit "Time Varying Regression with Hidden Linear Dynamics" , 2022 Citation Details

Shah, Anshul and Sra, Suvrit and Chellappa, Rama and Cherian, Anoop "Max-Margin Contrastive Learning" Proceedings of the AAAI Conference on Artificial Intelligence , v.36 , 2022 https://doi.org/10.1609/aaai.v36i8.20796 Citation Details

Sra, Suvrit "Metrics induced by Jensen-Shannon and related divergences on positive definite matrices" Linear algebra and its applications , v.616 , 2021 https://doi.org/ Citation Details

Sra, Suvrit "New concavity and convexity results for symmetric polynomials and their ratios" Linear and Multilinear Algebra , v.68 , 2020 https://doi.org/10.1080/03081087.2018.1527891 Citation Details

(Showing: 1 - 10 of 22)

Show All

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error