
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | June 30, 2015 |
Latest Amendment Date: | March 24, 2020 |
Award Number: | 1526386 |
Award Instrument: | Standard Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | July 1, 2015 |
End Date: | June 30, 2020 (Estimated) |
Total Intended Award Amount: | $449,999.00 |
Total Awarded Amount to Date: | $449,999.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1960 KENNY RD COLUMBUS OH US 43210-1016 (614)688-8735 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
OH US 43210-1277 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Single Instruction Multiple Data (SIMD) parallelism has been available in common processors for several decades now, and has been widely exploited for dense matrix and other regular problems. Recent architectural trends, such as increasing width of SIMD lanes coupled with instruction sets, provide significantly enhanced functionality. There is a need to effectively use the new architectural features for dynamic or irregular applications, which have not been easy to perform on parallel on SIMD processors in the past.
The PI proposes comprehensive research program on mapping various classes of unstructured and/or irregular applications to modern SIMD novel architectural features and developing optimized compiler transformations from programs written in CUDA and OpenCL. This research proposal proposes to address the challenge of developing and executing unstructured and/or irregular applications using novel SIMD processors' instructions such as scatter and gather. The PI plans to design compiler transformation methods from CUDA and OpenCL programs to increase the number of contiguous accesses and decrease the number of cache lines from which data needs to be accessed. These novel transformation methods are targeting applications such as unstructured grid kernels, sparse matrix computations, and graph and tree traversals. The PI plans to continue development of an Operator Overloading based library.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project researched how Single Instruction Multiple Data (SIMD) parallelism can be applied successfully to dynamic or "irregular" applications (such as multiplying matrices that are a mix of sparse and dense components) by using new and emerging system architectures which have specialized components, such as GPUs, thereby providing a range of capabilities. These dynamic or irregular applications have not been easy to parallelize well on older SIMD architectures which contain only standard components such as CPUs, perhaps with multiple cores, because these applications are not uniform in their computational needs. Major activities have been 1) a development of a sampling and reconstruction method to parallelize loops with dependencies (while using SIMD parallelism to gain efficiency), 2) extension of this method to execute efficiently on GPGPUs, 3) development of sampling methods that can help predict execution time of irregular applications with SIMD parallelism 4) development of compression techniques to make GPU calculations more efficient 5) out-of-core implementation of sparse matrix multiplications and 6) optimization of large-scale graph learning algorithms. Significant intellectual merit contributions include new methods to (a) merge partial sparse matrix multiplications (on samples) in parallel on GPUs and thereby achieve almost linear scalability, (b) parallelize sequential loops and achieve linear scalability across multiple cores, specifically, types of loops that could not be successfully parallelized by current state-of-the-art techniques, (c) improve data locality for and scale multiplications of sparse and dense matrices, (d) reduce data movement by summarizing and thereby compressing data as well as by localizing computations, all without loss of accuracy.
The indirect broader impacts of this work are in the improved efficiency of important types of computations that are very common in scientific applications, thus increasing the scalability of these applications. The direct broader impacts have been in the education and research training of ten graduate students, two of whom have gone on to academic careers.
Last Modified: 10/20/2020
Modified by: Rajiv Ramnath
Please report errors in award information by writing to: awardsearch@nsf.gov.