NSF Award Search: Award # 1911229 - OAC Core: Small: Scalable Graph Analytics on Emerging Cloud Infrastructure

Award Abstract # 1911229

OAC Core: Small: Scalable Graph Analytics on Emerging Cloud Infrastructure

NSF Org:	OAC Office of Advanced Cyberinfrastructure (OAC)
Recipient:	UNIVERSITY OF SOUTHERN CALIFORNIA
Initial Amendment Date:	May 20, 2019
Latest Amendment Date:	May 20, 2019
Award Number:	1911229
Award Instrument:	Standard Grant
Program Manager:	Seung-Jong Park OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering
Start Date:	June 1, 2019
End Date:	May 31, 2022 (Estimated)
Total Intended Award Amount:	$481,837.00
Total Awarded Amount to Date:	$481,837.00
Funds Obligated to Date:	FY 2019 = $481,837.00
History of Investigator:	Viktor Prasanna (Principal Investigator) prasanna@usc.edu Sanmukh Rao Kuppannagari (Co-Principal Investigator)
Recipient Sponsored Research Office:	University of Southern California 3720 S FLOWER ST FL 3 LOS ANGELES CA US 90033 (213)740-7762
Sponsor Congressional District:	34
Primary Place of Performance:	University of Southern California 3720 S. Flower St. Los Angeles CA US 90089-0001
Primary Place of Performance Congressional District:	37
Unique Entity Identifier (UEI):	G88KLJR3KYT5
Parent UEI:
NSF Program(s):	OAC-Advanced Cyberinfrast Core
Primary Program Source:	01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	026Z, 9179
Program Element Code(s):	090Y00
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Graphs are powerful tools for representing real world networked data in a wide range of scientific and engineering domains. As examples, graphs are used to represent people and their interactions in social networks, or proteins and their functionality in biological networks, landmarks and roads in transportation networks, etc. Understanding graph properties and deriving hidden information by performing analytics on graphs at extreme scale is critical for the progress of science across multiple domains and solving real world impactful problems. Cloud platforms have been adopted to perform extreme scale graph analytics. This has led to exponential increase in the workloads while at the same time the rate of performance improvements of cloud platforms has slowed down. To address this, cloud platforms are being augmented with accelerators. However, the expertise required to realize high performance from such accelerator enhanced cloud platforms will limit their accessibility to the broader scientific and engineering community. To address this issue, this project will research and develop a toolkit to provide Graph Analytics as a Service to enable researchers to easily perform extreme scale graph analytics workflows on accelerator enhanced cloud platforms. This will significantly increase the productivity of the researchers as i) the researchers will avoid the steep learning curve of developing parallel implementation of graph analytics algorithms, and ii) the increased size and scale of graph analytics will allow researchers to analyze significantly large datasets at reduced latency thereby enriching the quality of the domain research. Moreover, the techniques developed in this project will also be applicable for performing streaming graph analytics at the "edge" for applications such as autonomous vehicles, smart infrastructure, etc. The toolkit is expected to be used in many engineering and science disciplines including power systems engineering, network biology, preventive healthcare, smart infrastructure, etc. The research conducted in this project will also constitute materials appropriate for inclusion in graduate and undergraduate courses.

The project will research and develop high performance graph analytics algorithms and software for key graph workflows and kernels spanning multiple scientific and engineering domains. The target platform will be accelerator enhanced cloud platforms consisting of emerging node architectures comprising of multi-core processors, Field Programmable Gate Arrays (FPGAs) and high bandwidth memory (HBM) with cache coherent interface. An integrated optimization framework consisting of memory optimizations and partitioning and mapping techniques will be developed to exploit the heterogeneity of the target platforms. Specifically, techniques for optimal memory data layout and integrated optimizations for cloud execution will be developed to realize scalable performance in accelerator enhanced cloud platforms. The memory data layout optimization seeks to fully exploit the high bandwidth provided by HBM by ensuring data reuse for a broad class of graph analytics problems. The proposed software will ensure seamless parallel processing of the entire graph on a single heterogeneous node architecture as well as cloud platforms with multiple heterogeneous nodes. The integrated optimization framework will be developed into a scalable, deployable, robust Cyber Infrastructure (CI) toolkit to provide Graph Analytics as a Service (GAaaS). The framework will be developed using state-of-the-art heterogeneous platforms. By accelerating graph analytics workflows on cloud platforms, this project will enable researchers to perform extremely large-scale graph analytics workflows which are key components of many scientific and engineering domains.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 37)

Show All

Meng, Yuan and Kuppannagari, Sanmukh R. and Prasanna, Viktor K. "Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms" 28th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM) , 2020 Citation Details

Aggarwal, Madhav and Zhang, Bingyi and Prasanna, Viktor "Performance of Local Push Algorithms for Personalized PageRank on Multi-core Platforms" International Conference on High Performance Computing, Data, and Analytics. (HiPC) 2021 , 2021 https://doi.org/10.1109/HiPC53243.2021.00051 Citation Details

Cheung, Chung Ming and Kuppannagari, Sanmukh R. and Kannan, Rajgopal and Prasanna, Viktor K. "Leveraging Spatial Information in Smart Grids using STGCN for Short-Term Load Forecasting" International Conference on Contemporary Computing , 2021 Citation Details

Chueng, Chung Ming and Kuppannagari, Sanmukh Rao and Prasanna, Viktor K. "Socio-Demographic Characteristics Prediction using Soft Clustering of Load Consumption Data" Future Technologies Conference , 2021 Citation Details

Goel, Akshit and Kuppannagari, Sanmukh R. and Yang, Yang and Srivastava, Ajitesh and Prasanna, Viktor K. "Parallel Totally Induced Edge Sampling on FPGAs" ParaFPGA 19 , 2019 Citation Details

Kuppannagari, Sanmukh R. and Fu, Yao and Chueng, Chung Ming and Prasanna, Viktor K. "Spatio-Temporal Missing Data Imputation for Smart Power Grids" e-Energy Workshop , 2021 https://doi.org/10.1145/3447555.3466586 Citation Details

Kuppannagari, Sanmukh R. and Rajat, Rachit and Kannan, Rajgopal and Dasu, Aravind and Prasanna, Viktor K. "IP Cores for Graph Kernels on FPGAs" 2019 IEEE High Performance Extreme Computing Conference (HPEC) , 2019 10.1109/HPEC.2019.8916363 Citation Details

Lakhotia, Kartik and Petrini, Fabrizio and Kannan, Rajgopal and Prasanna, Viktor "Accelerating Allreduce With In-Network Reduction on Intel PIUMA" IEEE Micro , v.42 , 2022 https://doi.org/10.1109/MM.2021.3139092 Citation Details

Lin, Yi Chien and Zhang, Bingyi and Prasanna, Viktor "GCN Inference Acceleration using High-Level Synthesis" IEEE High Performance Extreme Computing Conference, 2021. , 2021 https://doi.org/10.1109/HPEC49654.2021.9622801 Citation Details

Lin, Yi-Chien and Zhang, Bingyi and Prasanna, Viktor "HP-GNN: Generating High Throughput GNN Training Implementation on CPU-FPGA Heterogeneous Platform" The 30h ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) , 2022 https://doi.org/10.1145/3490422.3502359 Citation Details

Meng, Yuan and Kuppannagari, Sanmukh and Kannan, Rajgopal and Prasanna, Viktor "DYNAMAP: Dynamic Algorithm Mapping Framework for Low Latency CNN Inference" ACM Field Programmable Gate Array (FPGA), 2021 , 2021 https://doi.org/10.1145/3431920.3439286 Citation Details

(Showing: 1 - 10 of 37)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Intellectual Merit

The project advanced the state of Robust Cyber Infrastructure for accelerator enhanced cloud platforms by accomplishing the following:

- Accelerating Graph analytics on a network of accelerator enhanced server nodes. These include:

Accelerating Allreduce with in-network reduction on Intel PIUMA: All-reduce is a key communication primitive in distributed graph processing and accelerating it will have a significant impact on improving the performance of several graph analytics frameworks. Utilizing the unique features of the PUIMA architecture?s network topology, we developed a methodology to generate extremely low latency embeddings for in-network Allreduce. Our approach employed a single leader design where each node communicates the entire reduction vector of its internal inputs on one of the HyperX links.
Estimating the Impact of Communication Schemes for Distributed Graph Analytics: For any computational model and partitioning scheme adopted by Graph Analytics frameworks, communication schemes ? the data to be communicated and the virtual interconnection network among the nodes ? have significant impact on the performance. To analyze this impact, we identified widely used communication schemes and estimated their performance. Analyzing the trade-offs between the number of compute nodes and communication costs of various schemes on a distributed platform by brute force experimentation can be prohibitively expensive. Thus, our performance estimation models provide an economic way to perform the analyses given the partitions and the communication scheme as input.

- Developing several novel techniques to accelerate various use cases of Graph Machine Learning training and inference on CPU-FPGA heterogeneous platforms. These include:

A framework for optimizing GCN inference on FPGAs: This included development of a novel hardware-aware Partition-Centric Feature Aggregation (PCFA) scheme that leverages 3-D partitioning with the vertex-centric computing paradigm to increase data reuse and reduce off-chip communication. We also developed a low-overhead task scheduling strategy to reduce pipeline stalls between the two computation stages of GCN Inference.
A framework to generate high throughput GNN training implementation on CPU-FPGA heterogeneous platforms: We developed a novel framework that, given a CPU-FPGA platform, generates high throughput GNN training implementations targeting the platform. Our framework can benefit both application developers and machine learning researchers.

Broader Impacts

The project developed accelerators for critical analytics tasks performed in many application domains. The specific platforms and devices that we used for our parallel implementations ? Multi-cores, FPGAs and HBM ? are under active research and development and our work can offer feedback to computer architects and systems researchers on the design of the next generation of these platforms. Our work can be used by many communities including Computer Science, Electrical Engineering and application developers who may not work in a specific technical area of Computer Science or Electrical Engineering. The research conducted constituted the material for class lectures and student projects for several graduate and undergraduate level courses including ?Special Topics Course on Accelerated Computing using FPGAs,? and ?Parallel Programming.? In addition, the project fully or partially supported nine graduate students, two of whom completed their PhDs in the duration of the project. The code bases developed in the project have been open sourced and made publicly available. These include Graph Sampling algorithms (https://github.com/madhavaggar/Forward_Local_Push, https://github.com/zjjzby/HPEC2021), Graph Machine Learning algorithms (https://github.com/jasonlin316/HP-GNN, https://github.com/jasonlin316/GCN-Inference-Acceleration-HLS), etc. These will be beneficial to researchers and users of graph analytics frameworks from various engineering and scientific domains.

Last Modified: 09/08/2022
Modified by: Sanmukh Rao Kuppannagari

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error