NSF Award Search: Award # 2229394

Award Abstract # 2229394

RII Track-4: NSF: Massively Parallel Graph Processing on Next-Generation Multi-GPU Supercomputers

NSF Org:	OIA OIA-Office of Integrative Activities
Recipient:	UNIVERSITY OF ALABAMA AT BIRMINGHAM
Initial Amendment Date:	November 29, 2022
Latest Amendment Date:	February 28, 2024
Award Number:	2229394
Award Instrument:	Standard Grant
Program Manager:	Pinhas Ben-Tzvi pbentzvi@nsf.gov (703)292-8246 OIA OIA-Office of Integrative Activities O/D Office Of The Director
Start Date:	February 1, 2023
End Date:	January 31, 2024 (Estimated)
Total Intended Award Amount:	$275,573.00
Total Awarded Amount to Date:	$164,102.00
Funds Obligated to Date:	FY 2023 = $164,101.00
History of Investigator:	Da Yan (Principal Investigator) yanda@iu.edu
Recipient Sponsored Research Office:	University of Alabama at Birmingham 701 S 20TH STREET BIRMINGHAM AL US 35294-0001 (205)934-5266
Sponsor Congressional District:	07
Primary Place of Performance:	Argonne National Laboratory 9700 S Cass Ave Lemont IL US 60439-4801
Primary Place of Performance Congressional District:	11
Unique Entity Identifier (UEI):	YND4PLMC9AN7
Parent UEI:
NSF Program(s):	EPSCoR RII: EPSCoR Research Fe
Primary Program Source:	01002324DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7217, 9150
Program Element Code(s):	196Y00
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.083

ABSTRACT

Graph processing is essential in real-world applications such as bioinformatics and social network analysis. Many fundamental graph operations are compute-intensive, for which the PI has successfully developed a series of CPU-scalable graph processing systems following a novel task-based parallel paradigm called T-thinker. However, it is non-trivial to extend this success to a GPU-rich environment due to a much larger gap between IO bandwidth and computing power of GPUs, and due to the unique programming requirements for GPU programs to be scalable. This project will develop a new task-based distributed GPU framework, T-thinkerGPU, and implement three applications on top, including subgraph matching, dense subgraph mining, and frequent subgraph pattern mining. T-thinkerGPU will be tested on the Aurora supercomputer at Argonne National Laboratory (ANL) as well as UAB?s Cheaha supercomputer, and the implementation will exploit modern GPU features including atomic operations, unified shared memory, and dynamic parallelism. This work will establish a solid foundation for long-term collaboration with ANL towards the development of GPU-scalable HPC solutions for various scientific applications. The project will also train a GPU-programming workforce (including a PhD student who will also visit ANL) that is in urgent need in Alabama, and all the proposed tools will be open source.

This Research Infrastructure Improvement Track-4 EPSCoR Research Fellows (RII Track-4) proposal would provide a fellowship to an Assistant professor and training for a graduate student at the University of Alabama at Birmingham (UAB). GPU supercomputers are increasingly being deployed in place of CPU supercomputers in the hope to benefit from not only significant performance improvement but also energy efficiency. Built on the success of task-based parallel paradigm, T-thinker, for scaling graph processing in a multi-CPU environment, this project aims to investigate novel task-based techniques to scale fundamental compute-intensive graph operations in a multi-GPU environment, especially the exascale Aurora supercomputer at ANL that is based on Intel GPUs. Specifically, the project will first investigate efficient representation schemes that encode and compress the input graph and intermediate subgraph results compactly to reduce memory footprint and enable coalesced memory access and data reuse in shared memory, such as hashed neighborhood signature and lossless pattern-based contraction. Secondly, the project will design GPU-friendly task-based algorithms for fundamental graph operations including subgraph matching, dense subgraph mining, and frequent subgraph pattern mining, to unleash the massive parallelism enabled by a multi-GPU environment like the Aurora supercomputer. Novel techniques will be investigated such as kernel-as-a-task execution model, a truly hybrid BFS-DFS task scheduling strategy, and several other GPU optimization approaches, which will be combined into a unified programming framework, T-thinkerGPU, with extendibility in mind to facilitate the development of GPU-scalable task-based algorithms for other graph operations in the future. Finally, the developed GPU programs will be extensively evaluated on Aurora (with Intel GPUs) and UAB?s Cheaha supercomputer (with Nvidia GPUs), using public benchmarks and scientific applications at ANL and UAB, and the code will be released on GitHub.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ahmad, Akhlaque and Yuan, Lyuheng and Yan, Da and Guo, Guimu and Chen, Jieyang and Zhang, Chengcui "Accelerating k-Core Decomposition by a GPU" 2023 IEEE 39th International Conference on Data Engineering (ICDE) , 2023 https://doi.org/10.1109/ICDE55515.2023.00142 Citation Details

Khalil, Jalal and Yan, Da and Yuan, Lyuheng and Han, Jiao and Adhikari Saugat and Long Cheng and Zhou Yang "FSM-Explorer: An Interactive Tool for Frequent Subgraph Pattern Mining from a Big Graph" 40th IEEE International Conference on Data Engineering , 2024 Citation Details

Yan, Da and Yuan, Lyuheng and Ahmad, Akhlaque and Zheng, Chenguang and Chen, Hongzhi and Cheng, James "Systems for Scalable Graph Analytics and Machine Learning: Trends and Methods" The 33rd International Joint Conference on Artificial Intelligence (IJCAI-24) , 2024 Citation Details

Yuan, Lyuheng and Ahmad, Akhlaque and Yan, Da and Han, Jiao and Adhikari, Saugat and Yu, Xiaodong and Zhou, Yang "G2-AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Search on GPUs" 40th IEEE International Conference on Data Engineering , 2024 Citation Details

Yuan, Lyuheng and Yan, Da and Han, Jiao and Ahmad, Akhlaque and Zhou, Yang and Jiang, Zhe "Faster Depth-First Subgraph Matching on GPUs" 40th IEEE International Conference on Data Engineering (ICDE) , 2024 Citation Details

Yuan, Lyuheng and Yan, Da and Qu, Wenwen and Adhikari, Saugat and Khalil, Jalal and Long, Cheng and Wang, Xiaoling "T-FSM: A Task-Based System for Massively Parallel Frequent Subgraph Pattern Mining from a Big Graph" Proceedings of the ACM on Management of Data , v.1 , 2023 https://doi.org/10.1145/3588928 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project has supported the PI and his two PhD students for their summer visit in the MCS division at Argonne National Laboratory (ANL), where they met ANL scientists and learned valuable supercomputing skills. Some activity highlights include:

The PI delivered a seminar on 'think like a task' (TLAV) programming model for graph computing: https://www.anl.gov/event/largescale-graph-computing-from-think-like-a-vertex-to-think-like-a-task
The student Lyuheng Yuan delivered a talk on the T-FSM system for parallel frequent subgraph pattern mining: https://www.anl.gov/event/2023-summer-student-presentations-ii
The student Saugat Adhikari delivered a talk on elevation-guided AI technology for flood mapping: https://www.anl.gov/event/2023-summer-student-presentations
Saugat was also selected to attend the training program of ATPESC 2023.

Through the investigation conducted in this project, a suite of new systems have been designed and implemented, including:

G2-AIMD (https://github.com/lyuheng/g2-aimd), a novel depth-first BFS solutions to subgraph finding problems such as maixmal clique finding and subgraph matching, with an adaptive chunk-size adjustment approach to avoid GPU device memory overflow. The work was led by Lyuheng and accepted by ICDE 2024, and the system was extensively tested on ALCF's Polaris supercomputer.
T-DFS (https://github.com/lyuheng/tdfs), a novel depth-first GPU solutions to subgraph finding problems such as subgraph matching, with an efficient lightweight workload rebalancing technique among the GPU warps. The work was led by Lyuheng and accepted by ICDE 2024, and the system was extensively tested on Polaris.
Distributed T-FSM, a scalable distributed task-based system for frequent subgraph pattern mining which is a work extending our previous single-machine T-FSM system published in SIGMOD 2023. The work was led by Lyuheng and currently under submission to ACM TODS, and the system was extensively tested on Polaris.

Besides the above works, the PI and his two students have also got a demo paper on T-FSM (the software is called FSM-Explorer) accepted by ICDE 2024, and a tutorial accepted by IJCAI 2024 on graph-parallel systems. This tutorial is also under submission to a few other conferences to broaden impacts.

A common feature of the above discussed works is that they are based on the T-thinker (aka. TLAV) programming paradigm (where "T" means "task") proposed by the PI in a poster in PPoPP 2019, and advertised by CCC as a great innovative idea at https://cra.org/ccc/great-innovative-ideas/t-thinker-a-task-centric-framework-to-revolutionize-big-data-systems-research/ Put simply, the idea of the T-thinker programming paradigm is that for compute-intensive problems, we can divide the problem space recursively until a sufficiently small unit is obtained, which we call a task. A worker (machine, CPU thread, GPU warp, etc.) can then obtain the subset of data needed for the task (linear IO cost), and compute it afterwards (often high time complexity). Since computation is the bottleneck, data moving cost can mostly be overlapped with the concurrent computation so that all CPU/GPU cores can be fully utilized. This is in contrast to existing cyberinfrastructure that targets data-intensive problems, which when used to solve compute-intensive problems, the performance is comparable to a single-threaded program as indicated by Frank McSherry https://github.com/frankmcsherry/blog/blob/master/posts/2017-09-23.md

CPU-parallel works on T-thinker have been supported by the PI's NSF CRII award that resulted in many graph-parallel systems published in top conferences: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1755464&HistoricalAwards=false The current project aims to extend this success to GPU supercomputers, so the proposed solutions are dedicated to GPU-parallel execution. The supported visit to ANL has enabled the PI's graph systems team to be equipped with strong GPU supercomputing skills and connections with ANL scientists, and the outcomes so far are fruitful. While the support of the current RII Track-4 award is ending, the exciting research will be be carried on to thrive further thanks to the PI's recent DOE ECRP award.

Last Modified: 04/05/2024
Modified by: Da Yan

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error