Award Abstract # 1718970
CCF-BSF: AF: Small: Convex and Non-Convex Distributed Learning

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO
Initial Amendment Date: July 10, 2017
Latest Amendment Date: July 15, 2019
Award Number: 1718970
Award Instrument: Standard Grant
Program Manager: A. Funda Ergun
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: January 1, 2018
End Date: December 31, 2021 (Estimated)
Total Intended Award Amount: $249,978.00
Total Awarded Amount to Date: $257,978.00
Funds Obligated to Date: FY 2017 = $249,978.00
FY 2019 = $8,000.00
History of Investigator:
  • Nathan Srebro (Principal Investigator)
    nati@ttic.edu
Recipient Sponsored Research Office: Toyota Technological Institute at Chicago
6045 S KENWOOD AVE
CHICAGO
IL  US  60637-2803
(773)834-0409
Sponsor Congressional District: 01
Primary Place of Performance: Toyota Technological Institute at Chicago
6045 S Kenwood Ave
Chicago
IL  US  60637-2902
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): ERBJF4DMW6G4
Parent UEI: ERBJF4DMW6G4
NSF Program(s): Algorithmic Foundations
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 014Z, 7923, 7926, 7934, 9251
Program Element Code(s): 779600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Machine learning is an increasingly important approach in tackling many difficult scientific, engineering and artificial intelligence tasks, ranging from machine translation and speech recognition, through control of self driving cars, to protein structure prediction and drug design.  The core idea of machine learning is to use examples and data to automatically train a system to perform some task.  Accordingly, the success of machine learning is tied to availability of large amounts of training data and our ability to process it.  Much of the recent success of machine learning is fueled by the large amounts of data (text, images, videos, etc) that can now be collected. But all this data also needs to be processed and learned from---indeed this data flood has shifted the bottleneck, to a large extent, from availability of data to our ability to process it. In particular, the amounts of data involved can no longer be stored and handled on single computers.  Consequently, distributed machine learning, where data is processed and learned from on many computers that communicate with each other, is a crucial element of modern large scale machine learning.

The goal of this project is to provide a rigorous framework for studying distributed machine learning, and through it develop efficient methods for distributed learning and a theoretical understanding of the benefits of these methods, as well as the inherent limitations of distributed learning.  A central component in the PIs' approach is to model distributed learning as a stochastic optimization problem, where different machines receive samples drawn from the same source distribution, thus allowing methods and analysis that specifically leverage the relatedness between data on different machines.  This is crucial for studying how availability of multiple computers can aid in reducing the computational cost of learning. Furthermore, the project also encompasses the more challenging case where there are significant differences between the nature of the data on different machines (for instance, when different machines serve different geographical regions, or when each machine is a personal device, collecting data from a single user).  In such a situation, the proposed approach to be studied is to integrate distributed learning with personalization or adaptation, which the PIs argue can not only improve learning performance, but also better leverage distributed computation.

This is an international collaboration, made possible through joint funding with the US-Israel Binational Science Foundation (BSF). The project brings together two PIs that have worked together extensively on related topics in machine learning and optimization.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 19)
Arjevani Yossi and Carmon Yair and Duchi John C. and Foster Dylan J. and Srebro Nathan and Woodworth Blake "Lower Bounds for Non-Convex Stochastic Optimization" ArXivorg , 2019 Citation Details
Arjevani, Yossi and Shamir, Ohad and Srebro, Nathan "A tight convergence analysis for stochastic gradient descent with delayed updates" International Conference on Algorithmic Learning Theory 2020 , 2020 https://doi.org/ Citation Details
Arjevani, Yossi and Shamir, Ohad and Srebro, Nathan "A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates" arXiv.org , 2018 Citation Details
Bullins Brian and Patel Kumar K. and Shamir Ohad and Srebro Nathan and Woodworth Blake "A Stochastic Newton Algorithm for Distributed Convex Optimization" Advances in neural information processing systems , v.34 , 2021 Citation Details
Eichner, Hubert and Koren, Tomer and Mcmahan, Brendan and Srebro, Nathan and Talwar, Kunal "Semi-Cyclic Stochastic Gradient Descent" Proceedings of the 36th International Conference on Machine Learning, PMLR , v.97 , 2019 Citation Details
Foster, Dylan Y and Sekhari, Ayush and Shamir, Ohad and Srebro, Nathan and Sridharan, Karthik and Woodworth, Blake "The Complexity of Making the Gradient Small in Stochastic Convex Optimization" Proceedings of Machine Learning Research , v.99 , 2019 Citation Details
Ji Ziwei and Srebro Nathan and Telgarsky Matus "Fast margin maximization via dual acceleration" International Conference on Machine Learning , 2021 Citation Details
Ma, Chenxin and Jaggi, Martin and Curtis, Frank E. and Srebro, Nathan and Taká, Martin "An accelerated communication-efficient primal-dual optimization framework for structured machine learning" Optimization Methods and Software , 2019 10.1080/10556788.2019.1650361 Citation Details
Rogers, Ryan and Roth, Aaron and Smith, Adam and Srebro, Nathan and Thakkar, Om and Woodworth, Blake "Guaranteed validity for empirical approaches to adaptive data analysis" International Conference on Artificial Intelligence and Statistics , 2020 https://doi.org/ Citation Details
Wang, Weiran and Srebro, Nathan "Stochastic Nonconvex Optimization with Large Minibatches" International Conference on Algorithmic Learning Theory 2019 , 2019 https://doi.org/ Citation Details
Wang, Weiran and Wang, Jialei and Kolar, Mladen and Srebro, Nathan "Distributed Stochastic Multi-Task Learning with Graph Regularization" arXiv.org , 2018 Citation Details
(Showing: 1 - 10 of 19)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The goal of the project was to study algorithmic and foundational aspects of training machine learning systems on distributed platforms, in particular using distributed stochastic optimization.  Stochastic optimization refers to optimizing an objective, searching for a good data fit, or training a system, by considering only a few examples at each iteration, as opposed to repeatedly considering the entire data set.  It traces its roots to a classic 1951 paper by Robbins and Monro, has been a staple of machine learning ever since the first learning systems in the 1950s and 1960s, and to this day stochastic optimization techniques, such as Stochastic Gradient Descent (SGD) form the basis of most machine learning training procedures.  A firm theoretical foundation for convex stochastic approximation was provided by the seminal work of Nemirovskii and Yudin in the 1970s, and exact optimal complexities and methods were derived by Lan in 2012 based on Nemirovskii’s work and Nesterov’s 1984 follow-up.  But modern machine learning frequently departs from the sequential settings and relies on parallel and distributed computation in order to handle massive data sets.   This includes parallelization at many scales, ranging from parallelization across cores in a GPU or multiple processors in a single computer, through parallelization across servers in a data center, to distributed computation on edge devices (e.g. individual phones, cars, or sensors) as in Federated Learning.  In this project we provide a firm theoretical framework for stochastic distributed optimization, develop methods with provable guarantees, and establish what is the best that can be done, and how.

 

The major achievement of the project, recognized by the Best Paper Award at the Conference on Learning Theory (the leading venue for such research) was establishing the “optimal complexity” of stochastic convex distributed optimization.  That is, what is the best that could be achieved, in terms of the required computation and communication, and subject to established assumptions, and what methods achieve this performance.  Previous work done as part of the project built towards this by providing the theoretical grounding, describing the relevant models, and studying the behavior of different methods, including showing that some commonly used methods can actually be problematic in some important regimes.  Understanding what is the best that can be done using the standard assumptions and standard model, also allows us to go further, by allowing us to understand what additional assumptions or stronger types of operations we must use in order to break this barrier and obtain better methods, as we have done in work employing “Hessian-vector-prodcuts”.

 

Improvements in distributed stochastic optimization directly translate into faster training with less communication in many scientific and engineering applications of machine learning.  E.g. through our interactions with Google, these advancements are quickly finding their way to reducing communication in Federated Learning.  Beyond the reduction in resource consumption, Federated Learning also aims at improving user privacy by maintaining user data on their own devices, which now become nodes in a distributed training/optimization process.  Reductions in communication may thus also lead to improved privacy.

 

The project resulted in 16 refereed conference and journal publications, including two awards (Best Student Paper and Best Paper) at the Conference on Learning Theory, which is the premier venue for research on the theory of machine learning.  This project was an international collaboration between PI Nathan Srebro at TTI-Chicago and PI Ohad Shamir at the Weizmann Institute of Science in Israel.  The project served as a basis for the PhD thesis of two graduate students, Blake Woodworth at TTI-Chicago and Yossi Arjevani at the Weizmann institute.  In addition, Jialei Wang at the University of Chicago finished his PhD working on initial stages of the project, and Kshitij Patel at TTIC worked on the project during his first two years at the PhD program, and his PhD will be based on extensions to the project.  All four students received mentorship from both PIs, and the project provided opportunities for exposure to different research ideas and environments, at Chicago and at the Weizmann.  Woodworth is now a post-doctoral researcher with Francis Bach at INRIA Paris, further expanding his international exposure, and Arjevani recently started a faculty position at the Hebrew University, after post-doctoral training at NYU.


 

 


Last Modified: 06/01/2022
Modified by: Nathan Srebro

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page