NSF Award Search: Award # 1406304

Award Abstract # 1406304

CSR: Medium: Collaborative Research: Programming Abstractions and Systems Support for GPU-Based Acceleration of Irregular Applications

NSF Org:	CNS Division Of Computer and Network Systems
Recipient:	TEXAS STATE UNIVERSITY
Initial Amendment Date:	September 10, 2014
Latest Amendment Date:	July 26, 2016
Award Number:	1406304
Award Instrument:	Standard Grant
Program Manager:	Marilyn McClure mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2014
End Date:	September 30, 2019 (Estimated)
Total Intended Award Amount:	$360,001.00
Total Awarded Amount to Date:	$376,001.00
Funds Obligated to Date:	FY 2014 = $360,001.00 FY 2016 = $16,000.00
History of Investigator:	Martin Burtscher (Principal Investigator) burtscher@txstate.edu
Recipient Sponsored Research Office:	Texas State University - San Marcos 601 UNIVERSITY DR SAN MARCOS TX US 78666-4684 (512)245-2314
Sponsor Congressional District:	15
Primary Place of Performance:	Texas State University 601 University Dr San Marcos TX US 78666-4684
Primary Place of Performance Congressional District:	15
Unique Entity Identifier (UEI):	HS5HWWK1AAU5
Parent UEI:
NSF Program(s):	CSR-Computer Systems Research
Primary Program Source:	01001415DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	9251, 7924
Program Element Code(s):	735400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

There is growing interest in using Graphics Processing Units (GPUs) to increase the performance and the energy efficiency of applications outside the graphics domain. GPUs are particularly suited to run regular programs that perform operations similar to pixel processing, and they can offer a large advantage over multicore CPUs in terms of performance, price, and energy efficiency in this domain. Not surprisingly, GPUs are increasingly appearing in devices ranging from handhelds to supercomputers.

Although regular algorithms are very important, new problem domains such as computational biology, data mining, and social networks necessitate very different algorithmic foundations: they require building, computing with, and updating large graphs. Unfortunately, relatively little is understood about how to implement irregular applications efficiently on current GPU architectures. Features such as lockstep operation and the need to minimize thread divergence and maximize memory coalescing pose particular challenges to efficient implementation of irregular algorithms. Nevertheless, some recent successes in hand-porting irregular codes suggest that the difficulties lie not in the GPU hardware but in the immaturity of the state of the art of writing and tuning GPU code due to the lack of general, well-understood optimization techniques.

This work will develop programming notations, compiler optimizations, and runtime system support that will enable programmers to express their algorithms at a high level of abstraction but still yield good performance. Projected tasks include producing highly optimized handwritten GPU implementations of important irregular algorithms and adding them to the LonestarGPU benchmark suite, identifying common patterns of optimizations and runtime systems support needed for efficient GPU implementations, developing a programming notation to permit the software developer to specify irregular algorithms at a high level of abstraction, implementing a synthesis system that automatically generates high-performance GPU code from these high-level specifications, and developing course material for teaching GPU programming of irregular codes.

The higher performance and better energy efficiency of GPUs relative to multicore CPUs has sincere societal benifits. This work builds on the realization of these benefits by facilitating simpler and more widespread utilization of GPUs and incorporating more effective practices into future compilers and GPU hardware.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jayadharini Jaiganesh and Martin Burtscher "A High-Performance Connected Components Implementation for GPUs" Proceedings of the 2018 ACM International Symposium on High-Performance Parallel and Distributed Computing , 2018

Martin Burtscher, Sindhu Devale, Sahar Azimi, Jayadharini Jaiganesh, and Evan Powers "A High-Quality and Fast Maximal Independent Set Implementation for GPUs" ACM Transactions on Parallel Computing , v.5 , 2018

Sepideh Maleki and Martin Burtscher "Automatic Hierarchical Parallelization of Linear Recurrences" Proceedings of the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems , 2018

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Graphics Processing Units (GPUs) are now widely used in devices ranging from supercomputers to handhelds to perform computations outside the graphics domain. Naturally, they are particularly suited to run regular programs that execute operations similar to pixel processing, and they can offer a large advantage over multicore CPUs in terms of performance, price, and energy efficiency on these types of codes. Although regular algorithms are very important, new domains such as computational biology, data mining, and social network analysis necessitate very different algorithmic foundations: they require building, computing with, and updating large graphs. Such irregular codes are unlike pixel processing, more difficult to parallelize, and not a natural fit for GPUs.

The goal of this project was to develop new techniques and optimizations to make the advantages provided by GPUs also available to complex and irregular codes. To this end, we have devised some of the fastest GPU programs for computing prefix sums, digital filters, maximal independent sets, connected components, and graph coloring. These algorithms are essential steps in many important applications, including cancer and tumor detection, drug discovery, and image processing. In addition to the publications describing our parallelization and optimization techniques, the source code for all these programs is available to anyone from the PI’s home page for direct use, study, modification, and inclusion in other applications.

This project has resulted in the training of multiple researchers and has yielded new teaching material that has been and will continue to be taught to hundreds of students at Texas State. The codes and strategies devised in this work can be used directly by programmers and may be incorporated into future compilers to accelerate countless applications. Increased utilization of GPU computing has several societal benefits due to the higher performance and better energy efficiency of GPUs relative to multicore CPUs, including making important and possibly life-saving computations much faster.

Last Modified: 01/13/2020
Modified by: Martin Burtscher

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error