Award Abstract # 1546452
BIGDATA: Collaborative Research: F: Nomadic Algorithms for Machine Learning in the Cloud

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF TEXAS AT AUSTIN
Initial Amendment Date: September 14, 2015
Latest Amendment Date: September 14, 2015
Award Number: 1546452
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: January 1, 2016
End Date: September 30, 2021 (Estimated)
Total Intended Award Amount: $610,432.00
Total Awarded Amount to Date: $610,432.00
Funds Obligated to Date: FY 2015 = $610,432.00
History of Investigator:
  • Inderjit Dhillon (Principal Investigator)
    inderjit@cs.utexas.edu
Recipient Sponsored Research Office: University of Texas at Austin
110 INNER CAMPUS DR
AUSTIN
TX  US  78712-1139
(512)471-6424
Sponsor Congressional District: 25
Primary Place of Performance: University of Texas at Austin
201 E. 24th Street, C0200
Austin
TX  US  78712-1229
Primary Place of Performance
Congressional District:
25
Unique Entity Identifier (UEI): V6AFQPN18437
Parent UEI:
NSF Program(s): Big Data Science &Engineering
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 8083
Program Element Code(s): 808300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

With an ever increasing ability to collect and archive data, massive data sets are becoming increasingly common. These data sets are often too big to fit into the main memory of a single computer, and so there is a great need for developing scalable and sophisticated machine learning methods for their analysis. In particular, one has to devise strategies to distribute the computation across multiple machines. However, stochastic optimization and inference algorithms that are so effective for large-scale machine learning appear to be inherently sequential.

The main research goal of this project is to develop a novel "nomadic" framework that overcomes this barrier. This will be done by showing that many modern machine learning problems have a certain "double separability" property. The aim is to exploit this property to develop convergent, asynchronous, distributed, and fault tolerant algorithms that are well-suited for achieving high performance on commodity hardware that is prevalent on today's cloud computing platforms. In particular, over a four year period, the following will be developed: (i) parallel stochastic optimization algorithms for the multi-machine cloud computing setting, (ii) theoretical guarantees of convergence, (iii) open source code under a permissive license, (iv) application of these techniques to a variety of problem domains such as topic models and mixture models. In addition, a cohort of students who can transfer their skills to both industry and academia will be trained, and a graduate level course on scalable machine learning will be developed.

The proposed research will enable practitioners in different application areas to quickly solve their big data problems. The results of the project will be disseminated widely through papers and open source software. Course material will be developed for the education of students in the area of Scalable Machine Learning, and the course will be co-taught at UCSC and UT Austin. The project will recruit women and minority students.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
H.F. Yu, C.-J. Hsieh and I. S. Dhillon "Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables" Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) , 2019
H.-F. Yu, C.-J. Hsieh, H. Yun, S.V.N Vishwanathan, and I. S. Dhillon "Nomadic Computing for Big Data Analytics" IEEE Computer , v.49 , 2016 , p.52 10.1109/MC.2016.116
J. Whang, Y. Hou, D. Gleich and I. S. Dhillon "Non-exhaustive, Overlapping Clustering" IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI) , v.41 , 2019 , p.2644
J. Whang, Y. Hou, D. Gleich and I. S. Dhillon, "Non-exhaustive, Overlapping Clustering" IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI) , 2018
J. Whang, Y. Jung, S. Kang, D. Yoo and I. S. Dhillon "Scalable Anti-TrustRank with QualifiedSite-level Seeds for Link-based Web Spam Detection" WWW (Companion Volume) , 2020
J. Zhang, H. F. Yu and I. S. Dhillon "AutoAssist: A Framework to Accelerate Training of Deep Neural Networks" Neural Information Processing Systems Conference(NeurIPS) , 2019
J. Zhang, P. Raman, S. Ji, H.F. Yu, S.V.N. Vishwanathan and I. S. Dhillon "Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models" Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) , 2019
J. Zhang, Q. Lei and I. S. Dhillon "Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization" International Conference on Machine Learning(ICML) , 2018
J. Zhang, Y. Lin, Z. Song and I. S. Dhillon "Learning Long Term Dependencies via Fourier Recurrent Units" International Conference on Machine Learning(ICML) , 2018
Q. Lei, A. Jalal, I. S. Dhillon and A. Dimakis "Inverting Deep Generative models, One layer at a time" Neural Information Processing Systems Conference (NeurIPS) , 2019
Q. Lei, J. Zhuo, C. Caramanis, I. S. Dhillon and A. Dimakis "Primal-Dual Block Generalized Frank-Wolfe" Neural Information Processing Systems Conference(NeurIPS) , 2019
(Showing: 1 - 10 of 11)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

As data grows in size and complexity, it is a contemporary challenge to develop scalable, robust and distributed algorithms for big data analytics. In paticular, data sets are often too big to fit into the main memory of a single computer, and so there is a great need for developing scalable and sophisticated machine learning methods for their analysis.

For this project, we have explored novel scalable algorithms and framework for large-scale machine learning problems, such as matrix completion, topic modeling, kernel machines, extreme classification, federated learning,  and sequence-to-sequence prediction. We have published papers and released software for these problems in addition to training students with expertise in these areas.

In particular, we have developed (i)  nomadic distributed, decentralized algorithms for very large-scale matrix completion, topic modeling and mixture modeling, (ii) communication­ efficient distributed block minimization algorithms for large-scale nonlinear kernel machines, (iii) parallel primal-dual sparse methods for large-scale extreme classification poblems, (iv) robust and efficient federated learning methods, and (v) stabilized, scalable methods for training sequence-to-sequence deep learning models. These algorithms allow us to train larger machine learning models for practical problems in natural language processing, recommender systems, object recognition, and information retrieval.

The results from this funded project have been disseminated through publications in various venues, such as KDD, NeurIPS, ICML, AISTATS, and IEEE Transactions, which are leading conferences and journals in machine learning and data mining. In terms of education and training, multiple Ph.D. students, including one female student, obtained their degree supported by funding from this project.

 

 

 

 

 


Last Modified: 02/13/2022
Modified by: Inderjit S Dhillon

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page