NSF Award Search: Award # 1564074

Award Abstract # 1564074

SHF: Medium: Collaborative Research: An Inspector/Executor Compilation Framework for Irregular Applications

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	UNIVERSITY OF UTAH
Initial Amendment Date:	July 27, 2016
Latest Amendment Date:	July 27, 2016
Award Number:	1564074
Award Instrument:	Standard Grant
Program Manager:	Anindya Banerjee abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2016
End Date:	August 31, 2021 (Estimated)
Total Intended Award Amount:	$399,993.00
Total Awarded Amount to Date:	$399,993.00
Funds Obligated to Date:	FY 2016 = $399,993.00
History of Investigator:	Mary Hall (Principal Investigator) mhall@cs.utah.edu
Recipient Sponsored Research Office:	University of Utah 201 PRESIDENTS CIR SALT LAKE CITY UT US 84112-9049 (801)581-6903
Sponsor Congressional District:	01
Primary Place of Performance:	University of Utah UT US 84112-8930
Primary Place of Performance Congressional District:	01
Unique Entity Identifier (UEI):	LL8GLEVH6MG3
Parent UEI:
NSF Program(s):	CI REUSE, Software & Hardware Foundation
Primary Program Source:	01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	026Z, 7433, 7924, 7943, 9150
Program Element Code(s):	689200, 779800
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Computational science and engineering provides inexpensive exploration of physical phenomena and design spaces and helps direct experimentation and advise theory. Irregular applications such as molecular dynamics simulations, n-body simulations, finite element analysis, and big graph analytics constitute a critical and significant portion of scientific computing applications. An irregular application is characterized by having indirect memory accesses that cannot be determined when the application is being compiled, therefore severely limiting the applicability of the large body of work on parallelizing compiler technology. Consequently, irregular applications, which are so important in pushing forward the frontiers of science, place a very large burden on computational and domain scientists in developing high-performance implementations for the ever-changing landscape of parallel architectures. The intellectual merit of this project is to develop a compiler and runtime framework for irregular applications, particularly well suited for sparse matrix and graph computations that underlie critical problems in computational science and data science. The broader impact is to provide domain scientists a powerful tool for optimizing and porting performance-critical, irregular computations to current and future multi-core processors and many-core accelerators. The PIs will also continue efforts in outreach and diversity to increase the participation in STEM careers, particularly among women and underrepresented minorities.

The approach in this project is to extend the well-established inspector/executor paradigm where the computational dependence structure (based on the memory access pattern) is determined at runtime, and runtime information is passed to a compile-time generated executor. Specifically, an inspector can examine the memory access patterns early in the computation at runtime, and an executor leverages this information to perform data and computation reordering and scheduling to affect memory hierarchy and parallelism optimizations. The project is developing a compiler and runtime framework with new abstractions for expressing and manipulating inspectors; these inspectors may then be integrated nearly seamlessly with each other and with existing compiler optimizations (e.g., loop tiling) to optimize executors. The project is also extending prior work that supports non-affine input code and mixes compile-time and runtime optimization. The resulting system increases the productivity of expert programmers in achieving both high performance and portability on a wide variety of irregular applications.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)

Show All

A. Venkat, M. S. Mohammadi, J. Park, H. Rong, R. Barik, M. M. Strout, M. Hall "Automating Wavefront Parallelization for Sparse Matrix Computations" SC16: International Conference for High Performance Computing, Networking, Storage and Analysis , 2016 10.1109/SC.2016.40

Bangtian Liu, Kazem Cheshmi, Saeed Soori, Michelle Mills Strout, and Maryam Mehri Dehnavi "MatRox: Modular approach for improving data locality in Hierarchical (Mat)rix App(Rox)imation" ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2020) , 2020 10.1145/3332466.3374548

Bangtian Liu, Kazem Cheshmi, Saeed Soori, Michelle Mills Strout, and Maryam Mehri Dehnavi} "MatRox: Modular approach for improving data locality in Hierarchical (Mat)rix App(Rox)imation" ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , 2020 978-1-4503-6818-6

Cheshmi, Kazem; Kamil, Shoaib; Strout, Michelle Mills; Dehnavi, Maryam Mehri "ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism" Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18) , 2018 10.1109/SC.2018.00065

Cheshmi, Kazem; Kamil, Shoaib; Strout, Michelle Mills; Dehnavi, Maryam Mehri "Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis" Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC17) , 2017 10.1145/3126908.3126936

Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, Maryam Mehri Dehnavi "Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis" International Conference on Supercomputing, Networking, Storage and Analysis , 2017

Kazem Cheshmi, Shoaib Kamil, Michelle Strout, Maryam Dehnavi "ParSy: Inspection and Transformation of SparseMatrix Computations for Parallelism" Proceedings of the International Conference High Performance Computing, Networking, Analysis and Storage (SC'18) , 2018

Mahdi Soltan Mohammadi, Tomofumi Yuki, Kazem Cheshmi, Eddie C. Davis, Mary Hall, Maryam Mehri Dehnavi, Payal Nandy, Catherine Olschanowsky, Anand Venkat, and Michelle Mills Strout "Sparse computation data dependence simplification for efficient compiler-generated inspectors" Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019) , 2019 https://doi.org/10.1145/3314221.3314646

Michelle Strout, Cathie Olschanowsky, Mary Hall "The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code" Proceedings of the IEEE , 2018

Mohammadi, Mahdi Soltan; Yuki, Tomofumi and Cheshmi, Kazem ; Davis, Eddie C; Hall, Mary; Dehnavi, Maryam Mehri; Nandy, Payal; Olschanowsky, Catherine; Venkat, Anand; Strout, Michelle Mills "Sparse computation data dependence simplification for efficient compiler-generated inspectors" Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation , 2019 10.1145/3314221.3314646

Popoola, Tobi; Shankar, Ravi; Rift, Anna; Singh, Shivani; Davis, Eddie C.; Strout, Michelle Mills; Olschanowsky, Catherine "An Object-Oriented Interface to The Sparse Polyhedral Library" 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC) , 2021 10.1109/COMPSAC51774.2021.00275

(Showing: 1 - 10 of 13)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Irregular computations involving sparse matrices, tensors and graphs (referred to as sparse computations) are found in a variety of important applications such as molecular dynamics, particle simulation, and deep learning. A common aspect of sparse computations is that not all points representing a 3-dimensional space have associated data -- as a result, the data are compressed to only describe nonzero values, and the data's organization, density, and nonzero patterns are only known when the data becomes available during program execution. Moreover, data must be decompressed during computation, which leads to unpredictable memory access patterns. These aspects of sparse computations lead to very low arithmetic intensity, where performance is dominated by moving the sparse data through the memory system.

The CHiLL-I/E project developed compiler optimization techniques to optimize sparse computations for modern parallel platforms including CPUs and GPUs. Recognizing that the memory access patterns are not fully available until runtime, CHiLL-I/E employs an inspector-executor approach, where the compiler generates inspector code at compile time that collects runtime information used in the optimized executor code that performs the computation. This work combines the existing CHiLL autotuning compiler and the IEGenLib for generating inspector-executor code. The resulting system employs sparse polyhedral compilation technology and a configurable schedule of polyhedral transformations to compose inspector-executor code generation with other code transformations, thus increasing parallelism and improving memory access patterns for sparse computations.

The CHiLL-I/E project focused on several critical issues associated with optimizing sparse computations using an inspector-executor approach:

Automated dependence testing for sparse computations and wavefront parallelism: We developed techniques for testing data dependences in the presence of indirect memory accesses through index arrays, which map from sparse data layouts to associated dense locations. The compiler derives a set of constraints, and a SAT solver determines whether the constraints are unsatisfiable, indicated there are no data dependences. Data dependences associated with remaining satisfiable constraints are evaluated at runtime using an inspector. This approach has been used to optimize common sparse linear algebra computations via wavefront parallelism, including privatizing index arrays to reduce dependences.
Optimizing inspectors: Inspectors may be generated to support multiple optimization goals, each potentially requiring a sweep over sparse data. We have developed a dataflowrepresentation of inspector computations, and graph-based abstractions for fusing computations and reducing unnecessary intermediate storage.
Generalizing sparse data layout conversion: Since the most performant sparse data format depends on architectural features, sparsity patterns and density, it is frequently necessary to transform a computation to work with different sparse data layouts. We have prototyped a general approach to layout conversion that is capable of transforming existing sparse tensor computations from the layout of their current specification to any new layout that can be expressed in a sparse polyhedral framework.
Co-optimizing data layout and tensor computation: We have developed a code generator in CHiLL-I/E that employs sparse polyhedral technology and code synthesis to generate code for co-iteration across multiple sparse tensors, limited to multiplication and addition operations between tensors. Our approach composes the iteration space of the computation with an iteration space over the data layout for one of the sparse tensors. It then performs a lookup in the other sparse tensor to see if that element appears in both tensors. Our approach combines polyhedral scanning over the composed iteration space with code synthesis using an SMT solver to implement the find operation.

Last Modified: 01/24/2022
Modified by: Mary Hall

Image

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error