
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | April 5, 2012 |
Latest Amendment Date: | May 2, 2013 |
Award Number: | 1162148 |
Award Instrument: | Continuing Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | April 1, 2012 |
End Date: | March 31, 2014 (Estimated) |
Total Intended Award Amount: | $809,432.00 |
Total Awarded Amount to Date: | $809,432.00 |
Funds Obligated to Date: |
FY 2013 = $419,280.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
77 MASSACHUSETTS AVE CAMBRIDGE MA US 02139-4301 (617)253-1000 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
77 Massachusetts Avenue Cambridge MA US 02139-4307 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
01001314DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Many high-end scientific applications perform stencil computations in their inner loops. A stencil defines the value of a grid point in a d-dimensional spatial grid at time t as a function of neighboring grid points at recent times before t. Stencil computations are conceptually simple to implement using nested loops, but looping implementations suffer from poor cache performance on multicore processors. Cache-oblivious divide-and-conquer stencil codes can achieve an order of magnitude improvement in cache efficiency over looping implementations, but most programmers find it difficult to write cache-oblivious stencil codes. Moreover, open problems remain in adapting these algorithms to realistic applications that lack the perfect regularity of simple examples. This project's investigation of cache-oblivious stencil compilation enables ordinary programmers of stencil computations to enjoy the benefits of multicore technology without requiring them to write code any more complex than naive nested loops.
The research project is developing a language embedded in C++ that can express stencil computations concisely and can be compiled automatically into highly efficient algorithmic code for multicore processors and other platforms. The Pochoir stencil compiler compiles stencil computations that exhibit complex boundary conditions, such as periodic, constant, Dirichlet, Neumann, mirrored, and phase factors; irregularities, including macroscopic and microscopic inhomogeneities, as well as irregular shapes; general complex dependencies, such as push dependencies, horizontal dependencies, and dynamic dependencies. To achieve these goals, the researchers are developing provably good algorithms for complex stencil computations; exploring how domain-specific compiler technology can achieve speedups from efficient cache management, processor-pipeline scheduling, and parallel computation; investigating how to run stencils efficiently on a wide variety of architectures such as multicore, distributed-memory clusters, graphical processing units, FPGA's, and future exascale machines; demonstrating the effectiveness of their research by developing a production-quality stencil compiler; developing a benchmark suite and benchmarking system for evaluating Pochoir.
This research enables scientific researchers and others to easily produce highly efficient codes for complex stencil computations. The codes make good use of the memory hierarchy and processor pipelines endemic to multicore processors and run fast on a diverse set of hardware platforms. This research eases the development and maintenance of a wide variety of stencil-based applications, ranging across physics, biology, chemistry, energy, climate, mechanical and electrical engineering, finance, and other areas, benefiting these application areas, as well as society at large.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Many high-end scientific applications perform stencil computations intheir inner loops. A stencil defines the value of a grid point in ad-dimensional spatial grid at time t as a function of neighboring gridpoints at recent times before~t. Stencil computations areconceptually simple to implement using nested loops, but loopingimplementations suffer from poor cache performance on multicoreprocessors. Cache-oblivious divide-and-conquer stencil codes canachieve an order of magnitude improvement in cache efficiency overlooping implementations, but most programmers find it difficult towrite cache-oblivious stencil codes. This project enables ordinaryprogrammers of stencil computations to enjoy the benefits of multicoretechnology without requiring them to write code any more complex thannaive nested loops.
This research developed a language embedded in C++ that can expressstencil computations concisely and can be compiled automatically intohighly efficient algorithmic code for multicore processors and otherplatforms. The Pochoir stencil compiler compiles stencilcomputations that exhibit
* complex boundary conditions, such as periodic, constant, Dirichlet, Neumann, mirrored, and phase factors;
* irregularities, including macroscopic and microscopic inhomogeneities, as well as irregular shapes;
To achieve these goals, the researchers
* developed provably good algorithms for complex stencil computations;
* explored how domain-specific compiler technology can achieve speedups from efficient cache management, processor-pipeline scheduling, chromatic scheduling, and parallel computation.
* investigated how to run stencils efficiently on a wide variety of architectures such as multicore, distributed-memory clusters, graphical processing units, FPGA's, and future exascale machines; and
* demonstrated the effectiveness of their research by developing a production-quality stencil compiler.
Intellectual merit: Real stencil applications oftenexhibit complex irregularities and dependencies, which makes itdifficult for programmers to produce efficient multicore code for themor to migrate them to other modern hardware platforms. Even simplestencils are hard to code for performance. This research attacked the difficult problem of generating high-efficiencycache-oblivious code for stencil computations that make good use ofthe memory hierarchy and processor pipelines, starting withsimple-to-write linguistic specifications. This effort requiredcross-domain technical expertise, including an understanding ofmulticore programming, strong theoretical skills to develop efficientparallel algorithms and data structures, systems experience to buildand tune a compiler and runtime system, knowledge of real applicationsthis technology will benefit, and an aesthetics for language design.
Broad impact: This research enables scientific researchers and othersto easily produce highly efficient codes for complex stencilcomputations. The codes make good use of the memory hierarchyand processor pipelines endemic to multicore processors and will runfast on a diverse set of hardware platforms. A wide variety ofstencil-based applications --- ranging across physics, biology,chemistry, energy, climate, mechanical and electrical engineering,finance, and other areas --- will become easier to develop andmaintain, benefiting these application areas, as well as society atlarge.
Last Modified: 06/12/2014
Modified by: Charles E Leiserson
Please report errors in award information by writing to: awardsearch@nsf.gov.