Award Abstract # 1846088
CAREER: Modern nonconvex optimization for machine learning: foundations of geometric and scalable techniques

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Initial Amendment Date: March 11, 2019
Latest Amendment Date: June 24, 2022
Award Number: 1846088
Award Instrument: Continuing Grant
Program Manager: Vladimir Pavlovic
vpavlovi@nsf.gov
 (703)292-8318
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: March 15, 2019
End Date: February 29, 2024 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2019 = $101,427.00
FY 2020 = $92,764.00

FY 2021 = $99,219.00

FY 2022 = $206,590.00
History of Investigator:
  • Suvrit Sra (Principal Investigator)
    suvrit@mit.edu
Recipient Sponsored Research Office: Massachusetts Institute of Technology
77 MASSACHUSETTS AVE
CAMBRIDGE
MA  US  02139-4301
(617)253-1000
Sponsor Congressional District: 07
Primary Place of Performance: Massachusetts Institute of Technology
77 Massachusetts Ave.
Cambridge
MA  US  02139-4307
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): E2NYLCDML6V1
Parent UEI: E2NYLCDML6V1
NSF Program(s): Robust Intelligence
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002223DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7495
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Mathematical optimization lies at the heart of machine learning (ML) and artificial intelligence (AI) algorithms. Key challenges herein are to decide what criteria to optimize, and what algorithms to use for performing the optimization. These challenges underlie the motivation for the present project. More specifically, this project seeks to make progress on three fundamental topics in optimization for ML: (i) theoretical foundations for a rich new class of optimization problems that can be solved efficiently (i.e., in a computationally tractable manner); (ii) a set of algorithms that apply to large-scale optimization problems in machine learning (e.g., for accelerating the training of neural networks); and (iii) theory that seeks to understand and explain why do neural networks succeed in practice. By focusing on topics of foundational importance, this project should spur a variety of followup research that deepends the connection of ML and AI with both mathematics and the applied sciences. More broadly, the this project may have a lasting societal impact too, primarily because of (i) its focus on optimization particularly relevant to ML and AI; (2) the non-traditional application domains it connects with (e.g., synthetic biology); and (3) because the investigator is in an environment that fosters such impact (namely, the Institute for Data, Systems, and Society (IDSS), a cross-disciplinary institute at MIT whose mission to drive solutions to problems of societal relevance). Finally, the project has an education centric focus; it involves intellectual and professional development of students, as well as development of curricular material based on the topics of research covered herein.

This project lays out an ambitious agenda to develop foundational theory for geometric optimization, large-scale nonconvex optimization, and deep neural networks. The research on geometric optimization (which is a powerful new subclass of nonconvex optimization), is originally motivated by applications in ML and statistics; however, it stands to have a broader impact across all disciplines that consume optimization. The investigator seeks to develop a theory of polynomial time optimization for a class strictly larger than usual convex optimization problems, and thereby endow practitioners with new polynomial time tools and models; if successful, this investigation could open an entire subarea of research and applications. Beyond geometric optimization, the project also focuses on large-scale nonconvex optimization and on the theory of optimization and generalization for deep learning. Within these topics, the project will address key theoretical challenges, develop scalable new algorithms that could greatly speed up neural network training, and also make progress that reduces the gap between the theory and real-world practice of nonconvex optimization.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 22)
Yurtsever, Alp and Gu, Alex and Sra, Suvrit "Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates" 35th Conference on Neural Information Processing Systems (NeurIPS 2021). , 2021 Citation Details
Yurtsever, Alp and Mangalick, Varun and Sra, Suvrit "Three Operator Splitting with a Nonconvex Loss Function" Proceedings of the 38th International Conference on Machine Learning , 2021 Citation Details
Yurtsever, Alp and Sra, Suvrit "CCCP is Frank-Wolfe in disguise" , 2022 Citation Details
Zhang, Jingzhao and Karimireddy, Sai Praneeth and Veit, Andreas and Kim, Seungyeon and Reddi, Sashank and Kumar, Sanjiv "Why are Adaptive Methods Good for Attention Models?" 34th Conference on Neural Information Processing Systems (NeurIPS 2020) , 2020 Citation Details
Zhang, Jingzhao and Lin, Hongzhou and Das, Subhro and Sra, Suvrit and Jadbabaie, Ali "Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity" , 2022 Citation Details
Zhang, Jingzhao and Lin, Hongzhou and Jegelka, Stefanie and Sra, Suvrit and Jadbabaie, Ali "Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions" Proceedings of the 37th International Conference on Machine Learning , 2020 Citation Details
Zhang, Jingzhao and Menon, Aditya Krishna and Veit, Andreas and Bhojanapalli, Srinadh and Kumar, Sanjiv and Sra, Suvrit "COPING WITH LABEL SHIFT VIA DISTRIBUTIONALLY ROBUST OPTIMISATION" International Conference on Learning Representations (ICLR) , 2021 Citation Details
Ahn, Kwangjun and Sra, Suvrit "Understanding Nesterov's Acceleration via Proximal Point Method" Symposium on Simplicity in Algorithms (SOSA) , 2022 https://doi.org/10.1137/1.9781611977066.9 Citation Details
Ahn, Kwangjun and Yun, Chulhee and Sra, Suvrit "SGD with shuffling: optimal rates without component convexity and large epoch requirements" 34th Conference on Neural Information Processing Systems (NeurIPS 2020) , 2020 Citation Details
Ahn, Kwangjun and Zhang, Jingzhao and Sra, Suvrit "Understanding the unstable convergence of gradient descent" , 2022 Citation Details
Cheng, Xiang and Zhang, Jingzhao and Sra, Suvrit "Efficient Sampling on Riemannian Manifolds via Langevin MCMC" , 2022 Citation Details
(Showing: 1 - 10 of 22)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page