
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | May 3, 2023 |
Latest Amendment Date: | May 3, 2023 |
Award Number: | 2311830 |
Award Instrument: | Standard Grant |
Program Manager: |
Varun Chandola
vchandol@nsf.gov (703)292-2656 OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2023 |
End Date: | August 31, 2026 (Estimated) |
Total Intended Award Amount: | $900,000.00 |
Total Awarded Amount to Date: | $900,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1960 KENNY RD COLUMBUS OH US 43210-1016 (614)688-8735 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1960 KENNY RD COLUMBUS OH US 43210-1016 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software Institutes |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Earthquake hazards pose potentially life-threatening risks to communities and cause significant economic damage. Numerical simulations of earthquakes on large-scale supercomputers are emerging as key to guiding the infrastructure and policy decisions as a result of earthquake modeling. These seismic and other codes including simulations involving Fast Fourier Transform (FFT) distribute the processing across a large number of compute nodes in a supercomputer. Optimizing the communication between nodes is key to achieving good performance but it is a daunting task given the scale of execution. The MVAPICH communication library that implements the Message Passing Interface (MPI) and the TAU Performance System, a profiling tool to observe the communication, will be tightly coupled to assess the performance impact of tuning these codes during execution. These libraries will share key performance parameters and optimize the communication in these applications to improve the time to solution. Performance-engineered versions of these codes will help drive the next generation of earthquake forecasting and help improve our understanding of seismic events to reduce risks to population centers and the environment. The research will enable undergraduate and graduate curriculum advancements via research in pedagogy for High Performance Computing (HPC), Deep/Machine Learning, and Data Analytics courses. The results will also be disseminated to the collaborating organizations of the investigators to impact their HPC software applications.
Emerging HPC systems---driven by many-core processors and accelerator architectures--- require innovations in existing infrastructure to deliver the best performance for science domains. The MPI 4.0 standard has also brought forward new opportunities for co-designing applications. These include partitioned point-to-point and collective operations, and neighborhood collectives. With these advances, there is a critical need to update the commonly used tools and libraries that form the basis for the NSF?s HPC cyberinfrastructure. The research undertakes this challenge and pursues new performance engineering avenues---by exploiting a co-design approach using the MPI_T API---in the MVAPICH2 and TAU libraries with scientific applications. The project focuses on two popular HPC applications spanning multiple domains and representing various communication patterns - Anelastic Wave Propagation (AWP-ODC) and Highly efficient FFTs for Exascale (heFFTe). AWP-ODC is a highly scalable parallel finite-difference application with point-to-point operations that enables 3D earthquake calculations. HeFFTe, dominated by collective operations, is a massively parallel application that provides a scalable and efficient implementation of the widely used Fast Fourier Transform (FFT) operations. The research aims to investigate and develop the following innovations by co-designing MVAPICH2 and TAU libraries to scale driving science domains---including AWP-ODC and heFFTe: 1) Load-aware designs for MPI asynchronous communication, 2) Cross runtime coordination for MPI+X applications, 3) Partitioned point-to-point primitives, 4) Application-aware neighborhood collective communication, 5) Support for adaptive persistent collective communication, and 6) Coordinating communication kernels on GPUs. Integrated development and evaluation are carried out to ensure proper integration of proposed designs with the driving applications, and closely work with internal and external collaborators to facilitate wide deployment and adoption of the released software. The transformative impact of the proposed effort is to extract the performance and scalability of HPC applications in next-generation HPC architectures through intelligent performance engineering.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
Please report errors in award information by writing to: awardsearch@nsf.gov.