Award Abstract # 1162148
SHF: AF: Medium: Collaborative Research:The Ponchoir Stencil Complier

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Initial Amendment Date: April 5, 2012
Latest Amendment Date: May 2, 2013
Award Number: 1162148
Award Instrument: Continuing Grant
Program Manager: Almadena Chtchelkanova
achtchel@nsf.gov
 (703)292-7498
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: April 1, 2012
End Date: March 31, 2014 (Estimated)
Total Intended Award Amount: $809,432.00
Total Awarded Amount to Date: $809,432.00
Funds Obligated to Date: FY 2012 = $390,152.00
FY 2013 = $419,280.00
History of Investigator:
  • Charles Leiserson (Principal Investigator)
    cel@csail.mit.edu
  • Steven Johnson (Co-Principal Investigator)
Recipient Sponsored Research Office: Massachusetts Institute of Technology
77 MASSACHUSETTS AVE
CAMBRIDGE
MA  US  02139-4301
(617)253-1000
Sponsor Congressional District: 07
Primary Place of Performance: Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge
MA  US  02139-4307
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): E2NYLCDML6V1
Parent UEI: E2NYLCDML6V1
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01001213DB NSF RESEARCH & RELATED ACTIVIT
01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7924, 7942
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Many high-end scientific applications perform stencil computations in their inner loops. A stencil defines the value of a grid point in a d-dimensional spatial grid at time t as a function of neighboring grid points at recent times before t. Stencil computations are conceptually simple to implement using nested loops, but looping implementations suffer from poor cache performance on multicore processors. Cache-oblivious divide-and-conquer stencil codes can achieve an order of magnitude improvement in cache efficiency over looping implementations, but most programmers find it difficult to write cache-oblivious stencil codes. Moreover, open problems remain in adapting these algorithms to realistic applications that lack the perfect regularity of simple examples. This project's investigation of cache-oblivious stencil compilation enables ordinary programmers of stencil computations to enjoy the benefits of multicore technology without requiring them to write code any more complex than naive nested loops.

The research project is developing a language embedded in C++ that can express stencil computations concisely and can be compiled automatically into highly efficient algorithmic code for multicore processors and other platforms. The Pochoir stencil compiler compiles stencil computations that exhibit complex boundary conditions, such as periodic, constant, Dirichlet, Neumann, mirrored, and phase factors; irregularities, including macroscopic and microscopic inhomogeneities, as well as irregular shapes; general complex dependencies, such as push dependencies, horizontal dependencies, and dynamic dependencies. To achieve these goals, the researchers are developing provably good algorithms for complex stencil computations; exploring how domain-specific compiler technology can achieve speedups from efficient cache management, processor-pipeline scheduling, and parallel computation; investigating how to run stencils efficiently on a wide variety of architectures such as multicore, distributed-memory clusters, graphical processing units, FPGA's, and future exascale machines; demonstrating the effectiveness of their research by developing a production-quality stencil compiler; developing a benchmark suite and benchmarking system for evaluating Pochoir.

This research enables scientific researchers and others to easily produce highly efficient codes for complex stencil computations. The codes make good use of the memory hierarchy and processor pipelines endemic to multicore processors and run fast on a diverse set of hardware platforms. This research eases the development and maintenance of a wide variety of stencil-based applications, ranging across physics, biology, chemistry, energy, climate, mechanical and electrical engineering, finance, and other areas, benefiting these application areas, as well as society at large.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Many high-end scientific applications perform stencil computations intheir inner loops.  A stencil defines the value of a grid point in ad-dimensional spatial grid at time t as a function of neighboring gridpoints at recent times before~t.  Stencil computations areconceptually simple to implement using nested loops, but loopingimplementations suffer from poor cache performance on multicoreprocessors.  Cache-oblivious divide-and-conquer stencil codes canachieve an order of magnitude improvement in cache efficiency overlooping implementations, but most programmers find it difficult towrite cache-oblivious stencil codes.  This project enables ordinaryprogrammers of stencil computations to enjoy the benefits of multicoretechnology without requiring them to write code any more complex thannaive nested loops.
This research developed a language embedded in C++ that can expressstencil computations concisely and can be compiled automatically intohighly efficient algorithmic code for multicore processors and otherplatforms.  The Pochoir stencil compiler compiles stencilcomputations that exhibit

* complex boundary conditions, such as periodic, constant,  Dirichlet, Neumann, mirrored, and phase factors;
* irregularities, including macroscopic and microscopic  inhomogeneities, as well as irregular shapes; 
To achieve these goals, the researchers 
* developed provably good algorithms for complex stencil  computations;
* explored how domain-specific compiler technology can achieve  speedups from efficient cache management, processor-pipeline  scheduling, chromatic scheduling, and parallel computation.
* investigated how to run stencils efficiently on a wide variety of  architectures such as multicore, distributed-memory clusters,  graphical processing units, FPGA's, and future exascale machines; and
* demonstrated the effectiveness of their research by developing  a production-quality stencil compiler.
Intellectual merit: Real stencil applications oftenexhibit complex irregularities and dependencies, which makes itdifficult for programmers to produce efficient multicore code for themor to migrate them to other modern hardware platforms.  Even simplestencils are hard to code for performance.  This research attacked the difficult problem of generating high-efficiencycache-oblivious code for stencil computations that make good use ofthe memory hierarchy and processor pipelines, starting withsimple-to-write linguistic specifications.  This effort requiredcross-domain technical expertise, including an understanding ofmulticore programming, strong theoretical skills to develop efficientparallel algorithms and data structures, systems experience to buildand tune a compiler and runtime system, knowledge of real applicationsthis technology will benefit, and an aesthetics for language design.
Broad impact: This research enables scientific researchers and othersto easily produce highly efficient codes for complex stencilcomputations.  The codes make good use of the memory hierarchyand processor pipelines endemic to multicore processors and will runfast on a diverse set of hardware platforms.  A wide variety ofstencil-based applications --- ranging across physics, biology,chemistry, energy, climate, mechanical and electrical engineering,finance, and other areas --- will become easier to develop andmaintain, benefiting these application areas, as well as society atlarge.


Last Modified: 06/12/2014
Modified by: Charles E Leiserson

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page