
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | August 4, 2016 |
Latest Amendment Date: | August 25, 2017 |
Award Number: | 1565414 |
Award Instrument: | Standard Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 15, 2016 |
End Date: | July 31, 2020 (Estimated) |
Total Intended Award Amount: | $1,171,893.00 |
Total Awarded Amount to Date: | $1,171,893.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1960 KENNY RD COLUMBUS OH US 43210-1016 (614)688-8735 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
OH US 43210-1206 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
CI REUSE, Software & Hardware Foundation, CSR-Computer Systems Research |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
This award was partially supported by the CIF21 Software Reuse Venture whose goals are to support pathways towards sustainable software elements through their reuse, and to emphasize the critical role of reusable software elements in a sustainable software cyberinfrastructure to support computational and data-enabled science and engineering.
Parallel programming based on MPI (Message Passing Interface) is being used with increased frequency in academia, government (defense and non-defense uses), as well as emerging uses in scalable machine learning and big data analytics. The emergence of Dense Many-Core (DMC) architectures like Intel's Knights Landing (KNL) and accelerator/co-processor architectures like NVIDIA GPGPUs are enabling the design of systems with high compute density. This, coupled with the availability of Remote Direct Memory Access (RDMA)-enabled commodity networking technologies like InfiniBand, RoCE, and 10/40GigE with iWARP, is fueling the growth of multi-petaflop and ExaFlop systems. These DMC architectures have the following unique characteristics: deeper levels of hierarchical memory; revolutionary network interconnects; and heterogeneous compute power and data movement costs (with heterogeneity at chip-level and node-level).
For these emerging systems, a combination of MPI and other programming models, known as MPI+X (where X can be PGAS, Tasks, OpenMP, OpenACC, or CUDA), are being targeted. The current generation communication protocols and mechanisms for MPI+X programming models cannot efficiently support the emerging DMC architectures. This leads to the following broad challenges: 1) How can high-performance and scalable communication mechanisms for next generation DMC architectures be designed to support MPI+X (including Task-based) programming models? and 2) How can the current and next generation applications be designed/co-designed with the proposed communication mechanisms?
A synergistic and comprehensive research plan, involving computer scientists from The Ohio State University (OSU) and Ohio Supercomputer Center (OSC) and computational scientists from the Texas Advanced Computing Center (TACC), San Diego Supercomputer Center (SDSC) and University of California San Diego (UCSD), is proposed to address the above broad challenges with innovative solutions. The research will be driven by a set of applications from established NSF computational science researchers running large scale simulations on Stampede and Comet and other systems at OSC and OSU. The proposed designs will be integrated into the widely-used MVAPICH2 library and made available for public use. Multiple graduate and undergraduate students will be trained under this project as future scientists and engineers in HPC. The established national-scale training and outreach programs at TACC, SDSC and OSC will be used to disseminate the results of this research to XSEDE users. Tutorials will be organized at XSEDE, SC and other conferences to share the research results and experience with the community.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Current generation multi-petascale systems are being powered by modern multi-core architectures like Intel Cascade Lake and AMD Rome, and accelerators from NVIDIA and AMD. These systems have tight integration with Remote Direct Memory Access (RDMA) enabled high-performance networking technologies like InfiniBand, Omni-Path, and RDMA over Converged Enhanced Ethernet (RoCE). Such architectures are being defined as Dense Many-Core (DMC) systems. These systems are starkly different from homogeneous clusters of the past. These evolving DMC systems are targeted for emerging exascale computing and are characterized by: 1) Deeper levels of hierarchical memory, 2) Revolutionary network interconnects, and 3) Heterogeneous compute power and data movement costs (with heterogeneity at chip-level and node-level).
The Message Passing Interface (MPI) has been the de-facto parallel programming model for the past two decades and is very successful in implementing regular and iterative parallel algorithms with well-defined communication patterns. The Remote Memory Access (RMA) features of the MPI-3 standard have shown promise for expressing algorithms that have irregular computation and communication patterns by enabling light-weight one-sided communication and synchronization operations. On the other hand, owing to dramatic changes in the architectures (high concurrency and low memory per-core), hybrid programming models such as MPI+OpenMP and MPI+OpenACC/CUDA are being adopted as some of the primary programming models for High-Performance Computing (HPC) applications. The evolution and diversity of programming models and their hybrid usage modes for next-generation systems is being defined in a generic manner in the community as the `MPI+X' model.
On the other hand, task-based programming models and runtimes such as the Asynchronous PGAS (APGAS) models seem to be able to achieve efficient load balancing, fault tolerance and latency hiding for highly irregular communication patterns. However, it may not be ideal to express global control flow and global communication. Thus, MPI+Task (as another form of X) has been gaining momentum in the community. However, designing a unified resource progression mechanism to avoid resource starvation and/or deadlocks for the MPI+Task model is opening up several research challenges due to the fundamental differences in the flow of control between the two models - MPI being user-driven control flow and APGAS/Task-based model using system/runtime scheduler driven control. These trends lead to the following broad challenges: 1) How can high-performance and scalable communication mechanisms for next-generation DMC architectures be designed to support MPI+X (including Task-based) programming models? and 2) How can the current and next-generation applications be designed/co-designed with the proposed communication mechanisms?
To address the above outlined challenges, in this project, we have adopted a multi-year and multi-tiered approach to exploit emerging DMC architectures and design optimized runtimes for supporting MPI and MPI+X programming models. Challenges have been addressed along the following directions:
1. Designing and developing high-performance, contention-aware, and scalable point-to-point and collective communication protocols and algorithms for DMC heterogeneous systems with latest generation CPUs and GPUs.
2. Designing and developing dynamic and adaptive communication protocols for contiguous and non-contiguous data layout in MPI.
3. Designing efficient communication and synchronization schemes for MPI+PGAS and MPI+X programming models.
4. Carrying out in-depth study of the new designs with a range of computing and networking technologies.
5. Co-designing a set of applications with the new runtimes and studied performance and scalability on a set of contemporary multi-petaflop systems.
6. Deploying the new frameworks and runtimes on various HPC systems at Ohio Supercomputer Center (OSC), Texas Advanced Computing Center (TACC), and San Diego Supercomputer Center (SDSC) and carrying out continuous engagement with their users to improve and optimize the designs and deliver better performance and scalability for a large number of applications.
The results of this research (new designs, performance results, benchmarks, etc.) have been made available to the community through the MVAPICH2 MPI libraries. Multiple releases of these libraries have been made during the project period. More than 400,000 copies of the MVAPICH2 MPI libraries have been downloaded from the project's web site during this project period. In each of these releases, features, performance numbers and scalability information have been shared with the MVAPICH user community through mailing lists and the project's web site. In addition to the software distribution, the results have been presented at various conferences and journals and events through Keynote talks, invited talks, tutorials, and hands-on sessions. The research has also led to thesis for several M.S. and Ph.D. students.
Last Modified: 11/28/2020
Modified by: Dhabaleswar K Panda
Please report errors in award information by writing to: awardsearch@nsf.gov.