
NSF Org: |
OIA OIA-Office of Integrative Activities |
Recipient: |
|
Initial Amendment Date: | November 29, 2022 |
Latest Amendment Date: | February 28, 2024 |
Award Number: | 2229394 |
Award Instrument: | Standard Grant |
Program Manager: |
Pinhas Ben-Tzvi
pbentzvi@nsf.gov (703)292-8246 OIA OIA-Office of Integrative Activities O/D Office Of The Director |
Start Date: | February 1, 2023 |
End Date: | January 31, 2024 (Estimated) |
Total Intended Award Amount: | $275,573.00 |
Total Awarded Amount to Date: | $164,102.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
701 S 20TH STREET BIRMINGHAM AL US 35294-0001 (205)934-5266 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
9700 S Cass Ave Lemont IL US 60439-4801 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | EPSCoR RII: EPSCoR Research Fe |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.083 |
ABSTRACT
Graph processing is essential in real-world applications such as bioinformatics and social network analysis. Many fundamental graph operations are compute-intensive, for which the PI has successfully developed a series of CPU-scalable graph processing systems following a novel task-based parallel paradigm called T-thinker. However, it is non-trivial to extend this success to a GPU-rich environment due to a much larger gap between IO bandwidth and computing power of GPUs, and due to the unique programming requirements for GPU programs to be scalable. This project will develop a new task-based distributed GPU framework, T-thinkerGPU, and implement three applications on top, including subgraph matching, dense subgraph mining, and frequent subgraph pattern mining. T-thinkerGPU will be tested on the Aurora supercomputer at Argonne National Laboratory (ANL) as well as UAB?s Cheaha supercomputer, and the implementation will exploit modern GPU features including atomic operations, unified shared memory, and dynamic parallelism. This work will establish a solid foundation for long-term collaboration with ANL towards the development of GPU-scalable HPC solutions for various scientific applications. The project will also train a GPU-programming workforce (including a PhD student who will also visit ANL) that is in urgent need in Alabama, and all the proposed tools will be open source.
This Research Infrastructure Improvement Track-4 EPSCoR Research Fellows (RII Track-4) proposal would provide a fellowship to an Assistant professor and training for a graduate student at the University of Alabama at Birmingham (UAB). GPU supercomputers are increasingly being deployed in place of CPU supercomputers in the hope to benefit from not only significant performance improvement but also energy efficiency. Built on the success of task-based parallel paradigm, T-thinker, for scaling graph processing in a multi-CPU environment, this project aims to investigate novel task-based techniques to scale fundamental compute-intensive graph operations in a multi-GPU environment, especially the exascale Aurora supercomputer at ANL that is based on Intel GPUs. Specifically, the project will first investigate efficient representation schemes that encode and compress the input graph and intermediate subgraph results compactly to reduce memory footprint and enable coalesced memory access and data reuse in shared memory, such as hashed neighborhood signature and lossless pattern-based contraction. Secondly, the project will design GPU-friendly task-based algorithms for fundamental graph operations including subgraph matching, dense subgraph mining, and frequent subgraph pattern mining, to unleash the massive parallelism enabled by a multi-GPU environment like the Aurora supercomputer. Novel techniques will be investigated such as kernel-as-a-task execution model, a truly hybrid BFS-DFS task scheduling strategy, and several other GPU optimization approaches, which will be combined into a unified programming framework, T-thinkerGPU, with extendibility in mind to facilitate the development of GPU-scalable task-based algorithms for other graph operations in the future. Finally, the developed GPU programs will be extensively evaluated on Aurora (with Intel GPUs) and UAB?s Cheaha supercomputer (with Nvidia GPUs), using public benchmarks and scientific applications at ANL and UAB, and the code will be released on GitHub.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project has supported the PI and his two PhD students for their summer visit in the MCS division at Argonne National Laboratory (ANL), where they met ANL scientists and learned valuable supercomputing skills. Some activity highlights include:
- The PI delivered a seminar on 'think like a task' (TLAV) programming model for graph computing: https://www.anl.gov/event/largescale-graph-computing-from-think-like-a-vertex-to-think-like-a-task
- The student Lyuheng Yuan delivered a talk on the T-FSM system for parallel frequent subgraph pattern mining: https://www.anl.gov/event/2023-summer-student-presentations-ii
- The student Saugat Adhikari delivered a talk on elevation-guided AI technology for flood mapping: https://www.anl.gov/event/2023-summer-student-presentations
- Saugat was also selected to attend the training program of ATPESC 2023.
Through the investigation conducted in this project, a suite of new systems have been designed and implemented, including:
- G2-AIMD (https://github.com/lyuheng/g2-aimd), a novel depth-first BFS solutions to subgraph finding problems such as maixmal clique finding and subgraph matching, with an adaptive chunk-size adjustment approach to avoid GPU device memory overflow. The work was led by Lyuheng and accepted by ICDE 2024, and the system was extensively tested on ALCF's Polaris supercomputer.
- T-DFS (https://github.com/lyuheng/tdfs), a novel depth-first GPU solutions to subgraph finding problems such as subgraph matching, with an efficient lightweight workload rebalancing technique among the GPU warps. The work was led by Lyuheng and accepted by ICDE 2024, and the system was extensively tested on Polaris.
- Distributed T-FSM, a scalable distributed task-based system for frequent subgraph pattern mining which is a work extending our previous single-machine T-FSM system published in SIGMOD 2023. The work was led by Lyuheng and currently under submission to ACM TODS, and the system was extensively tested on Polaris.
Besides the above works, the PI and his two students have also got a demo paper on T-FSM (the software is called FSM-Explorer) accepted by ICDE 2024, and a tutorial accepted by IJCAI 2024 on graph-parallel systems. This tutorial is also under submission to a few other conferences to broaden impacts.
A common feature of the above discussed works is that they are based on the T-thinker (aka. TLAV) programming paradigm (where "T" means "task") proposed by the PI in a poster in PPoPP 2019, and advertised by CCC as a great innovative idea at https://cra.org/ccc/great-innovative-ideas/t-thinker-a-task-centric-framework-to-revolutionize-big-data-systems-research/ Put simply, the idea of the T-thinker programming paradigm is that for compute-intensive problems, we can divide the problem space recursively until a sufficiently small unit is obtained, which we call a task. A worker (machine, CPU thread, GPU warp, etc.) can then obtain the subset of data needed for the task (linear IO cost), and compute it afterwards (often high time complexity). Since computation is the bottleneck, data moving cost can mostly be overlapped with the concurrent computation so that all CPU/GPU cores can be fully utilized. This is in contrast to existing cyberinfrastructure that targets data-intensive problems, which when used to solve compute-intensive problems, the performance is comparable to a single-threaded program as indicated by Frank McSherry https://github.com/frankmcsherry/blog/blob/master/posts/2017-09-23.md
CPU-parallel works on T-thinker have been supported by the PI's NSF CRII award that resulted in many graph-parallel systems published in top conferences: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1755464&HistoricalAwards=false The current project aims to extend this success to GPU supercomputers, so the proposed solutions are dedicated to GPU-parallel execution. The supported visit to ANL has enabled the PI's graph systems team to be equipped with strong GPU supercomputing skills and connections with ANL scientists, and the outcomes so far are fruitful. While the support of the current RII Track-4 award is ending, the exciting research will be be carried on to thrive further thanks to the PI's recent DOE ECRP award.
Last Modified: 04/05/2024
Modified by: Da Yan
Please report errors in award information by writing to: awardsearch@nsf.gov.