Award Abstract # 1651570
CAREER: A Runtime for Fast Data Analysis on Modern Hardware

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: THE LELAND STANFORD JUNIOR UNIVERSITY
Initial Amendment Date: April 6, 2017
Latest Amendment Date: February 17, 2021
Award Number: 1651570
Award Instrument: Continuing Grant
Program Manager: Marilyn McClure
mmcclure@nsf.gov
 (703)292-5197
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: April 15, 2017
End Date: March 31, 2022 (Estimated)
Total Intended Award Amount: $592,920.00
Total Awarded Amount to Date: $592,920.00
Funds Obligated to Date: FY 2017 = $226,683.00
FY 2019 = $118,473.00

FY 2020 = $122,040.00

FY 2021 = $125,724.00
History of Investigator:
  • Matei Zaharia (Principal Investigator)
    matei@cs.stanford.edu
Recipient Sponsored Research Office: Stanford University
450 JANE STANFORD WAY
STANFORD
CA  US  94305-2004
(650)723-2300
Sponsor Congressional District: 16
Primary Place of Performance: Stanford University
CA  US  94305-4100
Primary Place of Performance
Congressional District:
16
Unique Entity Identifier (UEI): HJD6G4D6TJY5
Parent UEI:
NSF Program(s): CSR-Computer Systems Research
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045
Program Element Code(s): 735400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The computer revolution that continuously transformed our society throughout the past 60 years happened because every year computer processors reliably became faster. Unfortunately, this trend has stopped. New processors can no longer easily be made faster. Instead, new computer hardware uses parallelism or specialized components to achieve performance, which has made it much harder to build high-performance applications since most existing data processing systems run 10-100x slower than they could even on current processors and will have even more trouble on emerging hardware. To drive advances in information processing, computer systems that automatically map applications to emerging hardware are needed. This is a challenging intellectual problem.

This project proposes "Weld", a run-time for data-intensive parallel computation on modern hardware. The project includes 2 main research thrusts:

*An intermediate language (IL) for data-intensive computation that can capture common data-intensive
applications but is easy to optimize for parallel hardware. This language enables mapping workloads to diverse hardware like CPUs and GPUs.
*A runtime API that lets Weld dynamically optimize across different libraries used in the same program. This API will allow Weld to perform complex optimizations like loop blocking across parallel libraries, unlocking speedups not yet possible.

Success of this project will result in the creation of software that automatically maps existing key data intensive applications (e.g., data analytics, machine learning and search) to emerging hardware devices and achieves a 10-100x speedup over current applications. Beyond producing new technology, this project will train the next generation of engineers in high performance processing, online teaching resources, and research mentoring for undergraduate and graduate students. Together, education and new technology may make industrial, scientific, and government users of big data 10-100x more productive and enable the next generation of knowledge-driven systems.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 21)
Coleman, Cody and Zaharia, Matei and Kang, Daniel and Narayanan, Deepak and Nardi, Luigi and Zhao, Tian and Zhang, Jian and Bailis, Peter and Olukotun, Kunle and Ré, Chris "Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark" ACM SIGOPS Operating Systems Review , v.53 , 2019 10.1145/3352020.3352024 Citation Details
Jain, Saachi and Zaharia, Matei "Spectral Lower Bounds on the I/O Complexity of Computation Graphs" SPAA 2020 , 2020 https://doi.org/10.1145/3350755.3400210 Citation Details
Jia, Zhihao and Padon, Oded and Thomas, James and Warszawski, Todd and Zaharia, Matei and Aiken, Alex "TASO: optimizing deep learning computation with automatic generation of graph substitutions" SOSP , 2019 10.1145/3341301.3359630 Citation Details
Jia, Zhihao and Thomas, James and Warszawski, Tod and Gao, Mingyu and Zaharia, Matei and Aiken, Alex "Optimizing DNN Computation with Relaxed Graph Substitutions" SysML 2019 , 2019 Citation Details
Jia, Zhihao and Zaharia, Matei and Aiken, Alex "Beyond Data and Model Parallelism for Deep Neural Networks" SysML 2019 , 2019 Citation Details
Kraft, Peter and Kang, Daniel and Narayanan, Deepak and Palkar, Shoumik and Bailis, Peter and and Zaharia, Matei. "Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference" MLSys 2020 , 2020 https://doi.org/ Citation Details
Narayanan, Deepak and Harlap, Aaron and Phanishayee, Amar and Seshadri, Vivek and Devanur, Nikhil R. and Ganger, Gregory R. and Gibbons, Phillip B. and Zaharia, Matei "PipeDream: generalized pipeline parallelism for DNN training" SOSP , 2019 10.1145/3341301.3359646 Citation Details
Narayanan, Deepak and Kazhamiaka, Fiodar and Abuzaid, Firas and Kraft, Peter and Agrawal, Akshay and Kandula, Srikanth and Boyd, Stephen and Zaharia, Matei "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP" SOSP 2021 , 2021 https://doi.org/10.1145/3477132.3483588 Citation Details
Narayanan, Deepak and Phanishayee, Amar and Shi, Kaiyu and Chen, Xie and Zaharia, Matei "Memory-Efficient Pipeline-Parallel DNN Training" Proceedings of Machine Learning Research , v.139 , 2022 Citation Details
Narayanan, Deepak and Santhanam, Keshav and Kazhamiaka, Fiodar and Phanishayee, Amar and and Zaharia, Matei "Analysis and Exploitation of Dynamic Pricing in the Public Cloud for ML Training" VLDB DISPA Workshop 2020 , 2020 https://doi.org/ Citation Details
Narayanan, Deepak and Santhanam, Keshav and Kazhamiaka, Fiodar and Phanishayee, Amar and and Zaharia, Matei "Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads" OSDI 2020 , 2020 https://doi.org/ Citation Details
(Showing: 1 - 10 of 21)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project explored techniques to continue scaling up computing on modern computer hardware, in order to make it easier for a large range of organizations to run large-scale computations such as data analytics and machine learning. Over the past 60 years, computer processors have usually gotten faster each year, bringing applications along with them, but recently, processor speeds have stopped to increase, and applications need to leverage parallelism or specialized hardware to continue scaling up. These are much more challenging for software developers and users.

In this project, we developed several techniques and systems to make high performance computing on modern hardware more broadly accessible. First, we developed Weld, a runtime that makes it easy for software developers to write fast versions of key routines that can then be combined and optimized together into a highly efficient program. We showed that Weld is easy to integrate into today's widely used software libraries, and can thus be used to accelerate existing applications without rewriting them. Second, we developed Split Annotations, a method to provide information about existing routines in a piece of software that will then allow it to be executed more efficiently at runtime by minimizing memory movement. Like Weld, Split Annotations can be used to accelerate and scale up existing software without modifying it. Third, we developed a number of techniques to speed up neural network computations in particular -- the computations that power modern AI advances in natural language processing, computer vision, audio, and other areas. These techniques take advantage of mathematical properties of the operators in a neural network to run the same computation more efficiently or parallelize it better across devices. They include TASO for optimizing operator graphs, Pipeline Parallelism for increasing parallelism, and FlexFlow for parallelizing a network across multiple dimensions. Across these areas, the work in this project can scale up existing applications by 10-100x on modern hardware with little effort from software developers.

We have released the major pieces of software we developed under open source licenses, including Weld, Split Annotations, Sparser, TASO, FlexFlow, and Pipeline Parallelism (including an integration into NVIDIA's Megatron-LM deep learning framework). Several of these projects have been deployed in industry at various companies, or extended by researchers.


Last Modified: 12/31/2022
Modified by: Matei Zaharia

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page