
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | August 15, 2013 |
Latest Amendment Date: | August 15, 2013 |
Award Number: | 1265449 |
Award Instrument: | Standard Grant |
Program Manager: |
Rajiv Ramnath
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2013 |
End Date: | August 31, 2017 (Estimated) |
Total Intended Award Amount: | $169,999.00 |
Total Awarded Amount to Date: | $169,999.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
202 HIMES HALL BATON ROUGE LA US 70803-0001 (225)578-2760 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
202 Himes Hall Baton Rouge LA US 70803-2701 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Gravity Theory, Software Institutes |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Modern computer system architectures are forcing computational scientists to move scientific applications
from traditional homogeneous cpu-based systems to heterogeneous multi-core/accelerator architectures.
Obtaining performance in the presence of accelerators requires close attention to
the memory hierarchy and chip-level parallelism to reach even a modest fraction
of the potential performance. As a result, coding tasks which were once the province of
lone graduate students in a single discipline now require interdisciplinary teams of people.
Project Chemora will explore the design of a new application framework for automatically
creating highly optimized code for high-end computational machines. The system
will use as input a set of partial differential equations (PDEs) that describe a
problem, it will then construct a machine-specific abstract performance model, and using these
it will generate well-tuned code and execution configurations for accelerated
(e.g., hybrid CPU/GPU) computing clusters at various scales. Chemora will
improve programmability in this simplified domain by decoupling the science and
computer science at a high level, thereby reducing the complexity and number of issues scientists need to
collectively understand and allowing individual scientists in the team to focus on their area of
specialty. Chemora will improve performance (both wallclock time and energy) for
systems with both simple and complex sets of equations by making use of detailed
information describing the problem and machine, and will provide improved load
balancing through the AMPI framework.
The Chemora project has chosen the Einstein equations as the primary science driver because
these equations are one of the more complex PDE systems, one with many
hundreds of terms, and a problem scale that is challenging to optimize for most
compilers. Achieving this vision for a general scientific problem would indeed
be a "Grand Challenge" in computational science, but in order to give our
research a sharper focus we have chosen as a science driver the
simulation of Intermediate mass ratio Binary Black Hole (IBBH) systems. Such
systems, consisting of a black hole of mass 100 to 1,000 solar masses orbited by
a smaller black hole of mass 5 to 20 solar masses are expected to be important
sources of gravitational waves for advanced Laser Interferometer Gravitational
Wave Observatory (LIGO) and the Einstein Telescope (ET). Accurate modeling of
the waveforms from IBBH systems will be necessary in order to extract
gravitational wave signals using template-matching data analysis techniques.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The modern era of scientific code development is both a golden age and a dark age. It is a golden age because we have machines that can operate at unprecedented levels of performance. Unfortunately, it is also a dark age because these machines are becoming increasingly hard to program and use. This is due in part to power becoming a limiting resource, in response to which energy-efficient GPU computational accelerator designs have been introduced that omit energy-consuming niceties such as caches and predictors, replacing these with a baroque memory model and a requirement that programs be coded to follow a large number of threads of execution. Such accelerator designs demand much more effort from programmers to properly use, and programs must be re-tuned for each succeeding accelerator generation.
This need to re-tune is much less of a problem for CPUs, because features such as large caches and branch predictors enable compilers to generate good code without having to know, for example, how many times a piece of data will be accessed. GPU accelerators replace large caches with several types of storage, such as a high-speed scratchpad memory. The decision on whether to use these specialized storage areas depends upon aspects of code execution that the compiler often cannot determine. As a result the burden is placed on the programmer to decide how to stage data.
Though making use of specialized storage and other GPU features is beyond the ability of current compilers for *any* type of program, it is feasible to do for specialized domains, including for what are called stencil calculations. Stencil calculations, are used to solve many important scientific and engineering problems, such as simulating black holes, neutron stars, exploring quantum cosmology, simulating fluids, and performing coastal simulations.
The goal of the project was to develop a stencil framework, Chemora, that would allow a physicist or some other domain expert to code a stencil simulation in what is called a domain-specific language (DSL), and then run it on a GPU-accelerated cluster, and have its performance rival that of hand-tuned code. The use of the DSL makes it much easier for Chemora to generate code since Chemora knows much more about the movement of data than could be determined for an unconstrained language. Chemora can use all the kinds of specialized memory appearing on recent GPU accelerators (including shared, constant, and texture memory), and can transform calculations into pieces that each comfortably fit on the accelerator device. Chemora transforms a calculation based on a performance model of the device, in some cases optimizing certain rearrangements, in other cases using the model to choose among multiple candidates. Chemora operates in part when a program is run, which is when key information such as input data and system characteristics are first available. It takes advantage of this to generate highly efficient code.
Chemora can generate efficient code for several GPU generations. Chemora has been updated as new accelerators become available. Programs coded in Chemora's DSL enjoy good performance on the new devices without any effort required on the part of the original programmers.
The driver application for Chemora is a black hole simulation based on Einstein's equations. The application operates on a large number of quantities, something which introduces problems not encountered with simpler code. To accommodate this Chemora's model considers factors ignored by others, such as what is called register pressure. As a result, Chemora can efficiently run applications that may overwhelm other systems.
Future work will bring the Chemora research to production level, enabling scientists to make better use of computational resources and make new kinds of science possible in many areas that benefit humanity both directly and indirectly.
Last Modified: 11/29/2017
Modified by: Steven Brandt
Please report errors in award information by writing to: awardsearch@nsf.gov.