
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | May 20, 2019 |
Latest Amendment Date: | May 20, 2019 |
Award Number: | 1911229 |
Award Instrument: | Standard Grant |
Program Manager: |
Seung-Jong Park
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | June 1, 2019 |
End Date: | May 31, 2022 (Estimated) |
Total Intended Award Amount: | $481,837.00 |
Total Awarded Amount to Date: | $481,837.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
3720 S FLOWER ST FL 3 LOS ANGELES CA US 90033 (213)740-7762 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
3720 S. Flower St. Los Angeles CA US 90089-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | OAC-Advanced Cyberinfrast Core |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Graphs are powerful tools for representing real world networked data in a wide range of scientific and engineering domains. As examples, graphs are used to represent people and their interactions in social networks, or proteins and their functionality in biological networks, landmarks and roads in transportation networks, etc. Understanding graph properties and deriving hidden information by performing analytics on graphs at extreme scale is critical for the progress of science across multiple domains and solving real world impactful problems. Cloud platforms have been adopted to perform extreme scale graph analytics. This has led to exponential increase in the workloads while at the same time the rate of performance improvements of cloud platforms has slowed down. To address this, cloud platforms are being augmented with accelerators. However, the expertise required to realize high performance from such accelerator enhanced cloud platforms will limit their accessibility to the broader scientific and engineering community. To address this issue, this project will research and develop a toolkit to provide Graph Analytics as a Service to enable researchers to easily perform extreme scale graph analytics workflows on accelerator enhanced cloud platforms. This will significantly increase the productivity of the researchers as i) the researchers will avoid the steep learning curve of developing parallel implementation of graph analytics algorithms, and ii) the increased size and scale of graph analytics will allow researchers to analyze significantly large datasets at reduced latency thereby enriching the quality of the domain research. Moreover, the techniques developed in this project will also be applicable for performing streaming graph analytics at the "edge" for applications such as autonomous vehicles, smart infrastructure, etc. The toolkit is expected to be used in many engineering and science disciplines including power systems engineering, network biology, preventive healthcare, smart infrastructure, etc. The research conducted in this project will also constitute materials appropriate for inclusion in graduate and undergraduate courses.
The project will research and develop high performance graph analytics algorithms and software for key graph workflows and kernels spanning multiple scientific and engineering domains. The target platform will be accelerator enhanced cloud platforms consisting of emerging node architectures comprising of multi-core processors, Field Programmable Gate Arrays (FPGAs) and high bandwidth memory (HBM) with cache coherent interface. An integrated optimization framework consisting of memory optimizations and partitioning and mapping techniques will be developed to exploit the heterogeneity of the target platforms. Specifically, techniques for optimal memory data layout and integrated optimizations for cloud execution will be developed to realize scalable performance in accelerator enhanced cloud platforms. The memory data layout optimization seeks to fully exploit the high bandwidth provided by HBM by ensuring data reuse for a broad class of graph analytics problems. The proposed software will ensure seamless parallel processing of the entire graph on a single heterogeneous node architecture as well as cloud platforms with multiple heterogeneous nodes. The integrated optimization framework will be developed into a scalable, deployable, robust Cyber Infrastructure (CI) toolkit to provide Graph Analytics as a Service (GAaaS). The framework will be developed using state-of-the-art heterogeneous platforms. By accelerating graph analytics workflows on cloud platforms, this project will enable researchers to perform extremely large-scale graph analytics workflows which are key components of many scientific and engineering domains.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Intellectual Merit
The project advanced the state of Robust Cyber Infrastructure for accelerator enhanced cloud platforms by accomplishing the following:
- Accelerating Graph analytics on a network of accelerator enhanced server nodes. These include:
- Accelerating Allreduce with in-network reduction on Intel PIUMA: All-reduce is a key communication primitive in distributed graph processing and accelerating it will have a significant impact on improving the performance of several graph analytics frameworks. Utilizing the unique features of the PUIMA architecture?s network topology, we developed a methodology to generate extremely low latency embeddings for in-network Allreduce. Our approach employed a single leader design where each node communicates the entire reduction vector of its internal inputs on one of the HyperX links.
- Estimating the Impact of Communication Schemes for Distributed Graph Analytics: For any computational model and partitioning scheme adopted by Graph Analytics frameworks, communication schemes ? the data to be communicated and the virtual interconnection network among the nodes ? have significant impact on the performance. To analyze this impact, we identified widely used communication schemes and estimated their performance. Analyzing the trade-offs between the number of compute nodes and communication costs of various schemes on a distributed platform by brute force experimentation can be prohibitively expensive. Thus, our performance estimation models provide an economic way to perform the analyses given the partitions and the communication scheme as input.
- Developing several novel techniques to accelerate various use cases of Graph Machine Learning training and inference on CPU-FPGA heterogeneous platforms. These include:
- A framework for optimizing GCN inference on FPGAs: This included development of a novel hardware-aware Partition-Centric Feature Aggregation (PCFA) scheme that leverages 3-D partitioning with the vertex-centric computing paradigm to increase data reuse and reduce off-chip communication. We also developed a low-overhead task scheduling strategy to reduce pipeline stalls between the two computation stages of GCN Inference.
- A framework to generate high throughput GNN training implementation on CPU-FPGA heterogeneous platforms: We developed a novel framework that, given a CPU-FPGA platform, generates high throughput GNN training implementations targeting the platform. Our framework can benefit both application developers and machine learning researchers.
Broader Impacts
The project developed accelerators for critical analytics tasks performed in many application domains. The specific platforms and devices that we used for our parallel implementations ? Multi-cores, FPGAs and HBM ? are under active research and development and our work can offer feedback to computer architects and systems researchers on the design of the next generation of these platforms. Our work can be used by many communities including Computer Science, Electrical Engineering and application developers who may not work in a specific technical area of Computer Science or Electrical Engineering. The research conducted constituted the material for class lectures and student projects for several graduate and undergraduate level courses including ?Special Topics Course on Accelerated Computing using FPGAs,? and ?Parallel Programming.? In addition, the project fully or partially supported nine graduate students, two of whom completed their PhDs in the duration of the project. The code bases developed in the project have been open sourced and made publicly available. These include Graph Sampling algorithms (https://github.com/madhavaggar/Forward_Local_Push, https://github.com/zjjzby/HPEC2021), Graph Machine Learning algorithms (https://github.com/jasonlin316/HP-GNN, https://github.com/jasonlin316/GCN-Inference-Acceleration-HLS), etc. These will be beneficial to researchers and users of graph analytics frameworks from various engineering and scientific domains.
Last Modified: 09/08/2022
Modified by: Sanmukh Rao Kuppannagari
Please report errors in award information by writing to: awardsearch@nsf.gov.