
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | April 6, 2017 |
Latest Amendment Date: | February 17, 2021 |
Award Number: | 1651570 |
Award Instrument: | Continuing Grant |
Program Manager: |
Marilyn McClure
mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | April 15, 2017 |
End Date: | March 31, 2022 (Estimated) |
Total Intended Award Amount: | $592,920.00 |
Total Awarded Amount to Date: | $592,920.00 |
Funds Obligated to Date: |
FY 2019 = $118,473.00 FY 2020 = $122,040.00 FY 2021 = $125,724.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
450 JANE STANFORD WAY STANFORD CA US 94305-2004 (650)723-2300 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
CA US 94305-4100 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | CSR-Computer Systems Research |
Primary Program Source: |
01001920DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The computer revolution that continuously transformed our society throughout the past 60 years happened because every year computer processors reliably became faster. Unfortunately, this trend has stopped. New processors can no longer easily be made faster. Instead, new computer hardware uses parallelism or specialized components to achieve performance, which has made it much harder to build high-performance applications since most existing data processing systems run 10-100x slower than they could even on current processors and will have even more trouble on emerging hardware. To drive advances in information processing, computer systems that automatically map applications to emerging hardware are needed. This is a challenging intellectual problem.
This project proposes "Weld", a run-time for data-intensive parallel computation on modern hardware. The project includes 2 main research thrusts:
*An intermediate language (IL) for data-intensive computation that can capture common data-intensive
applications but is easy to optimize for parallel hardware. This language enables mapping workloads to diverse hardware like CPUs and GPUs.
*A runtime API that lets Weld dynamically optimize across different libraries used in the same program. This API will allow Weld to perform complex optimizations like loop blocking across parallel libraries, unlocking speedups not yet possible.
Success of this project will result in the creation of software that automatically maps existing key data intensive applications (e.g., data analytics, machine learning and search) to emerging hardware devices and achieves a 10-100x speedup over current applications. Beyond producing new technology, this project will train the next generation of engineers in high performance processing, online teaching resources, and research mentoring for undergraduate and graduate students. Together, education and new technology may make industrial, scientific, and government users of big data 10-100x more productive and enable the next generation of knowledge-driven systems.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project explored techniques to continue scaling up computing on modern computer hardware, in order to make it easier for a large range of organizations to run large-scale computations such as data analytics and machine learning. Over the past 60 years, computer processors have usually gotten faster each year, bringing applications along with them, but recently, processor speeds have stopped to increase, and applications need to leverage parallelism or specialized hardware to continue scaling up. These are much more challenging for software developers and users.
In this project, we developed several techniques and systems to make high performance computing on modern hardware more broadly accessible. First, we developed Weld, a runtime that makes it easy for software developers to write fast versions of key routines that can then be combined and optimized together into a highly efficient program. We showed that Weld is easy to integrate into today's widely used software libraries, and can thus be used to accelerate existing applications without rewriting them. Second, we developed Split Annotations, a method to provide information about existing routines in a piece of software that will then allow it to be executed more efficiently at runtime by minimizing memory movement. Like Weld, Split Annotations can be used to accelerate and scale up existing software without modifying it. Third, we developed a number of techniques to speed up neural network computations in particular -- the computations that power modern AI advances in natural language processing, computer vision, audio, and other areas. These techniques take advantage of mathematical properties of the operators in a neural network to run the same computation more efficiently or parallelize it better across devices. They include TASO for optimizing operator graphs, Pipeline Parallelism for increasing parallelism, and FlexFlow for parallelizing a network across multiple dimensions. Across these areas, the work in this project can scale up existing applications by 10-100x on modern hardware with little effort from software developers.
We have released the major pieces of software we developed under open source licenses, including Weld, Split Annotations, Sparser, TASO, FlexFlow, and Pipeline Parallelism (including an integration into NVIDIA's Megatron-LM deep learning framework). Several of these projects have been deployed in industry at various companies, or extended by researchers.
Last Modified: 12/31/2022
Modified by: Matei Zaharia
Please report errors in award information by writing to: awardsearch@nsf.gov.