
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 27, 2016 |
Latest Amendment Date: | July 27, 2016 |
Award Number: | 1564074 |
Award Instrument: | Standard Grant |
Program Manager: |
Anindya Banerjee
abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $399,993.00 |
Total Awarded Amount to Date: | $399,993.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
201 PRESIDENTS CIR SALT LAKE CITY UT US 84112-9049 (801)581-6903 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
UT US 84112-8930 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
CI REUSE, Software & Hardware Foundation |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Computational science and engineering provides inexpensive exploration of physical phenomena and design spaces and helps direct experimentation and advise theory. Irregular applications such as molecular dynamics simulations, n-body simulations, finite element analysis, and big graph analytics constitute a critical and significant portion of scientific computing applications. An irregular application is characterized by having indirect memory accesses that cannot be determined when the application is being compiled, therefore severely limiting the applicability of the large body of work on parallelizing compiler technology. Consequently, irregular applications, which are so important in pushing forward the frontiers of science, place a very large burden on computational and domain scientists in developing high-performance implementations for the ever-changing landscape of parallel architectures. The intellectual merit of this project is to develop a compiler and runtime framework for irregular applications, particularly well suited for sparse matrix and graph computations that underlie critical problems in computational science and data science. The broader impact is to provide domain scientists a powerful tool for optimizing and porting performance-critical, irregular computations to current and future multi-core processors and many-core accelerators. The PIs will also continue efforts in outreach and diversity to increase the participation in STEM careers, particularly among women and underrepresented minorities.
The approach in this project is to extend the well-established inspector/executor paradigm where the computational dependence structure (based on the memory access pattern) is determined at runtime, and runtime information is passed to a compile-time generated executor. Specifically, an inspector can examine the memory access patterns early in the computation at runtime, and an executor leverages this information to perform data and computation reordering and scheduling to affect memory hierarchy and parallelism optimizations. The project is developing a compiler and runtime framework with new abstractions for expressing and manipulating inspectors; these inspectors may then be integrated nearly seamlessly with each other and with existing compiler optimizations (e.g., loop tiling) to optimize executors. The project is also extending prior work that supports non-affine input code and mixes compile-time and runtime optimization. The resulting system increases the productivity of expert programmers in achieving both high performance and portability on a wide variety of irregular applications.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Irregular computations involving sparse matrices, tensors and graphs (referred to as sparse computations) are found in a variety of important applications such as molecular dynamics, particle simulation, and deep learning. A common aspect of sparse computations is that not all points representing a 3-dimensional space have associated data -- as a result, the data are compressed to only describe nonzero values, and the data's organization, density, and nonzero patterns are only known when the data becomes available during program execution. Moreover, data must be decompressed during computation, which leads to unpredictable memory access patterns. These aspects of sparse computations lead to very low arithmetic intensity, where performance is dominated by moving the sparse data through the memory system.
The CHiLL-I/E project developed compiler optimization techniques to optimize sparse computations for modern parallel platforms including CPUs and GPUs. Recognizing that the memory access patterns are not fully available until runtime, CHiLL-I/E employs an inspector-executor approach, where the compiler generates inspector code at compile time that collects runtime information used in the optimized executor code that performs the computation. This work combines the existing CHiLL autotuning compiler and the IEGenLib for generating inspector-executor code. The resulting system employs sparse polyhedral compilation technology and a configurable schedule of polyhedral transformations to compose inspector-executor code generation with other code transformations, thus increasing parallelism and improving memory access patterns for sparse computations.
The CHiLL-I/E project focused on several critical issues associated with optimizing sparse computations using an inspector-executor approach:
- Automated dependence testing for sparse computations and wavefront parallelism: We developed techniques for testing data dependences in the presence of indirect memory accesses through index arrays, which map from sparse data layouts to associated dense locations. The compiler derives a set of constraints, and a SAT solver determines whether the constraints are unsatisfiable, indicated there are no data dependences. Data dependences associated with remaining satisfiable constraints are evaluated at runtime using an inspector. This approach has been used to optimize common sparse linear algebra computations via wavefront parallelism, including privatizing index arrays to reduce dependences.
- Optimizing inspectors: Inspectors may be generated to support multiple optimization goals, each potentially requiring a sweep over sparse data. We have developed a dataflowrepresentation of inspector computations, and graph-based abstractions for fusing computations and reducing unnecessary intermediate storage.
- Generalizing sparse data layout conversion: Since the most performant sparse data format depends on architectural features, sparsity patterns and density, it is frequently necessary to transform a computation to work with different sparse data layouts. We have prototyped a general approach to layout conversion that is capable of transforming existing sparse tensor computations from the layout of their current specification to any new layout that can be expressed in a sparse polyhedral framework.
- Co-optimizing data layout and tensor computation: We have developed a code generator in CHiLL-I/E that employs sparse polyhedral technology and code synthesis to generate code for co-iteration across multiple sparse tensors, limited to multiplication and addition operations between tensors. Our approach composes the iteration space of the computation with an iteration space over the data layout for one of the sparse tensors. It then performs a lookup in the other sparse tensor to see if that element appears in both tensors. Our approach combines polyhedral scanning over the composed iteration space with code synthesis using an SMT solver to implement the find operation.
Last Modified: 01/24/2022
Modified by: Mary Hall
Please report errors in award information by writing to: awardsearch@nsf.gov.