Award Abstract # 1526386
SHF: Small: Techniques and Frameworks for Exploiting Recent SIMD Architectural Advances

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: OHIO STATE UNIVERSITY, THE
Initial Amendment Date: June 30, 2015
Latest Amendment Date: March 24, 2020
Award Number: 1526386
Award Instrument: Standard Grant
Program Manager: Almadena Chtchelkanova
achtchel@nsf.gov
 (703)292-7498
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2015
End Date: June 30, 2020 (Estimated)
Total Intended Award Amount: $449,999.00
Total Awarded Amount to Date: $449,999.00
Funds Obligated to Date: FY 2015 = $449,999.00
History of Investigator:
  • Rajiv Ramnath (Principal Investigator)
    ramnath.6@osu.edu
  • Gagan Agrawal (Former Principal Investigator)
Recipient Sponsored Research Office: Ohio State University
1960 KENNY RD
COLUMBUS
OH  US  43210-1016
(614)688-8735
Sponsor Congressional District: 03
Primary Place of Performance: Ohio State University
OH  US  43210-1277
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): DLWBSLWAJWR1
Parent UEI: MN4MDDMN8529
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923, 7942
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Single Instruction Multiple Data (SIMD) parallelism has been available in common processors for several decades now, and has been widely exploited for dense matrix and other regular problems. Recent architectural trends, such as increasing width of SIMD lanes coupled with instruction sets, provide significantly enhanced functionality. There is a need to effectively use the new architectural features for dynamic or irregular applications, which have not been easy to perform on parallel on SIMD processors in the past.

The PI proposes comprehensive research program on mapping various classes of unstructured and/or irregular applications to modern SIMD novel architectural features and developing optimized compiler transformations from programs written in CUDA and OpenCL. This research proposal proposes to address the challenge of developing and executing unstructured and/or irregular applications using novel SIMD processors' instructions such as scatter and gather. The PI plans to design compiler transformation methods from CUDA and OpenCL programs to increase the number of contiguous accesses and decrease the number of cache lines from which data needs to be accessed. These novel transformation methods are targeting applications such as unstructured grid kernels, sparse matrix computations, and graph and tree traversals. The PI plans to continue development of an Operator Overloading based library.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Linchuan Chen, Peng Jiang, Gagan Agrawal "Exploiting recent SIMD architectural advances for irregular applications" Proceedings of Conference on Code Generation and Optimization (CGO) , 2016
Peng Jiang and Gagan Agrawal "Combining SIMD and Many/Multi-core Parallelism for Finite Stat Machines with Enumerative Speculation" Proceedings of the 22nd {ACM} {SIGPLAN} Symposium on Principles and Practice of Parallel Programming, Austin, TX, USA, February 4-8, 2017 , 2017
Peng Jiang and Gagan Agrawal "Efficient SIMDand MIMD parallelization of hash-based aggregation by conflict mitigation" Proceedings of the International Conference on Supercomputing (ICS) 2017, Chicago, IL, USA, June 14-16, 2017 , 2017
Peng Jiang, Changwan Hong, and Gagan Agrawal "A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs" Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 20), ). Association for Computing Machinery, New York, NY, USA , 2020 , p.376388 https://doi.org/10.1145/3332466.3374546
Peng Jiang, Linchuan Chen, Gagan Agrawal "Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications" Proceedings of International Conference on Supercomputing (ICS) , 2016
Ruoming Jin, Zhen Peng, Wendell Wu, Feodor Dragan, Gagan Agrawal, and Bin Ren "Parallelizing pruned landmark labeling: dealing with dependencies in graph algorithms." Proceedings of the 34th ACM International Conference on Supercomputing (ICS 20). Association for Computing Machinery, New York, NY, USA , 2020 , p.1 https://doi.org/10.1145/3392717.3392745
Yang Xia, Peng Jiang, and Gagan Agrawal "Scaling out speculative execution of finite-state machines with parallel merge" Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 20). Association for Computing Machinery, New York, NY, USA , 2020 , p.160172 https://doi.org/10.1145/3332466.3374524

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project researched how Single Instruction Multiple Data (SIMD) parallelism can be applied successfully to dynamic or "irregular" applications (such as multiplying matrices that are a mix of sparse and dense components) by using new and emerging system architectures which have specialized components, such as GPUs, thereby providing a range of capabilities. These dynamic or irregular applications have not been easy to parallelize well on older SIMD architectures which contain only standard components such as CPUs, perhaps with multiple cores, because these applications are not uniform in their computational needs. Major activities have been 1) a development of a sampling and reconstruction method to parallelize loops with dependencies (while using SIMD parallelism to gain efficiency), 2) extension of this method to execute efficiently on GPGPUs, 3) development of sampling methods that can help predict execution time of irregular applications with SIMD parallelism 4) development of compression techniques to make GPU calculations more efficient 5) out-of-core implementation of sparse matrix multiplications and 6) optimization of large-scale graph learning algorithms. Significant intellectual merit contributions include new methods to (a) merge partial sparse matrix multiplications (on samples) in parallel on GPUs and thereby achieve almost linear scalability, (b) parallelize sequential loops and achieve linear scalability across multiple cores, specifically, types of loops that could not be successfully parallelized by current state-of-the-art techniques, (c) improve data locality for and scale multiplications of sparse and dense matrices, (d) reduce data movement by summarizing and thereby compressing data as well as by localizing computations, all without loss of accuracy.  

The indirect broader impacts of this work are in the improved efficiency of important types of computations that are very common in scientific applications, thus increasing the scalability of these applications. The direct broader impacts have been in the education and research training of ten graduate students, two of whom have gone on to academic careers.


Last Modified: 10/20/2020
Modified by: Rajiv Ramnath

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page