
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | August 27, 2015 |
Latest Amendment Date: | August 27, 2015 |
Award Number: | 1450429 |
Award Instrument: | Standard Grant |
Program Manager: |
Seung-Jong Park
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2015 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $2,126,446.00 |
Total Awarded Amount to Date: | $2,126,446.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
201 ANDY HOLT TOWER KNOXVILLE TN US 37996-0001 (865)974-3466 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
TN US 37996-0003 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Software Institutes, CDS&E |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Modern High Performance Computing (HPC) systems continue to increase in size and complexity. Tools to measure application performance in these increasingly complex environments must also increase the richness of their measurements to provide insights into the increasingly intricate ways in which software and hardware interact. The PAPI performance-monitoring library has provided a clear, portable interface to the hardware performance counters available on all modern CPUs and some other components of interest (scattered across the chip and system). Widely deployed and widely used, PAPI has established itself as fundamental software infrastructure, not only for the scientific computing community, but for many industry users of HPC as well. But the radical changes in processor and system design that have occurred over the past several years pose new challenges to PAPI and the HPC software infrastructure as a whole. The PAPI-EX project integrates critical PAPI enhancements that flow from both governmental and industry research investments, focusing on processor and system design changes that are expected to be present in every extreme scale platform on the path to exascale computing.
The primary impact of PAPI-EX is a direct function of the importance of the PAPI library. PAPI has been in predominant use by tool developers, major national HPC centers, system vendors, and application developers for over 15 years. PAPI-EX builds on that foundation. As important research infrastructure, the PAPI-EX project allows PAPI to continue to play its essential role in the face of the revolutionary changes in the design and scale of new systems. In terms of enhancing discovery and education, the list of partners working with PAPI-EX includes NSF computing centers, major tool developers, major system vendors, and individual community leaders, and this diverse group will help facilitate training sessions, targeted workshops, and mini-symposia at national and international meetings. Finally, the active promotion of PAPI by many major system vendors means that PAPI, and therefore PAPI-EX, will continue to deliver major benefits for government and industry in many domains.
PAPI-EX addresses a hardware environment in which the cores of current and future multicore CPUs share various performance-critical resources (a.k.a., 'inter-core' resources), including power management, on-chip networks, the memory hierarchy, and memory controllers between cores. Failure to manage contention for these 'inter-core' resources has already become a major drag on overall application performance. Consequently, the lack of ability to reveal the actual behavior of these resources at a low level, has become very problematic for the users of the many performance tools (e.g., TAU, HPCToolkit, Open|SpeedShop, Vampir, Scalasca, CrayPat, Active Harmony, etc.). PAPI-EX enhances and extends PAPI to solve this critical problem and prepare it to play its well-established role in HPC performance optimization. Accordingly, PAPI-EX targets the following objectives: (1) Develop shared hardware counter support that includes system-wide and inter-core measurements; (2) Provide support for data-flow based runtime systems; (3) Create a sampling interface to record streams of performance data with relevant context; (4) Combine an easy-to-use tool for text-based application performance analysis with updates to PAPI?s high-level API to create a basic, ?out of the box? instrumentation API.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Modern High-Performance Computing (HPC) systems continue to increase in size and complexity. Tools to measure application performance in these increasingly complex environments must also increase the richness of their measurements to provide insights into the increasingly intricate ways in which software and hardware interact. The PAPI performance-monitoring library has provided a clear, portable interface to the hardware performance counters available on all modern CPUs and some other components of interest, scattered across the chip and system. Widely deployed and widely used, PAPI has established itself as fundamental software infrastructure, not only for the scientific computing community, but for many industry users of HPC as well. But the radical changes in processor and system design that have occurred over the past several years pose continual challenges to PAPI and the HPC software infrastructure as a whole. The PAPI-EX project has integrated critical PAPI enhancements that flow from both governmental and industry research investments, focusing on processor and system design changes that are present in every extreme scale platform on the path to exascale computing.
The PAPI-EX project addressed the following aspects of the problem of hardware performance counter monitoring in modern, multicore and heterogeneous architectures:
1. 'Shared Hardware' Counter Support: The PAPI library has been extended to address a hardware environment in which the cores of current and future multicore CPUs share various performance-critical resources (a.k.a., 'inter-core' resources), including power management, on-chip networks, the memory hierarchy, and memory controllers between cores. The newly developed 'shared hardware' performance counter support that includes system-wide measurements for Intel, AMD, and IBM provides PAPI users with the ability to reveal the actual behavior of these 'inter-core' resources at a low level.
2. Counter Analysis Toolkit: As part of this project's goal to test and improve the quality of PAPI when monitoring shared hardware counters that include system-wide measurements in existing multi-core architectures, a new Counter Analysis Toolkit (CAT) has been developed and added to the PAPI release. CAT assists with native performance counter disambiguation through micro-benchmarks that are used to probe different important aspects of modern CPUs, which, ultimately, aids the classification of raw performance events.
3. Power and Energy Management: PAPI's capabilities have been enhanced to support monitoring and capping of power usage on recent AMD GPU and NVIDIA GPU architectures. Additionally, PAPI's latest NVIDIA GPU support enables monitoring of half-precision floating-point operations (addition, multiplication, fused-multiply-add), memory throughput, NVLINK performance, fan speed, and temperature. These additional monitoring capabilities of GPU and memory performance counters aid application scientists in producing more efficient code by profiling the utilization of the latest GPU resources and diagnosing performance bottlenecks.
4. A new, simpler PAPI interface: The new high-level API, developed under this award, provides the ability to record performance events within instrumented code sections (called 'regions') of serial, multi-processing, and thread parallel applications. The high-level API has been developed in response to demand from the community for something simpler than PAPI's tool-focused 'low-level' API; and its main goal is to improve ease of use for application developers who wish to perform direct instrumentation of their source code. With less than a handful of functions, a user can measure performance events simply by marking code sections with specific region names. The dynamic setting of events via an environment variable and the automatic detection of components makes using the high-level API extremely simple.
The PAPI interface has been widely used by HPC users for many years, drawing on its strength as a cross-platform and cross-architecture API. The extensions to the PAPI performance counter library for new hardware generations allow application and tool developers to use a familiar interface to obtain relevant performance monitoring information for achieving the best possible performance in modern computing environments.
Last Modified: 11/30/2021
Modified by: Heike Jagode
Please report errors in award information by writing to: awardsearch@nsf.gov.