NSF Award Search: Award # 1533828

Award Abstract # 1533828

XPS: Full: FP: Collaborative Research: Sphinx: Combining Data and Instruction Level Parallelism through Demand Driven Execution of Imperative Programs

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	MICHIGAN TECHNOLOGICAL UNIVERSITY
Initial Amendment Date:	July 20, 2015
Latest Amendment Date:	August 18, 2016
Award Number:	1533828
Award Instrument:	Standard Grant
Program Manager:	Anindya Banerjee abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	August 1, 2015
End Date:	July 31, 2020 (Estimated)
Total Intended Award Amount:	$560,000.00
Total Awarded Amount to Date:	$575,876.00
Funds Obligated to Date:	FY 2015 = $560,000.00 FY 2016 = $15,876.00
History of Investigator:	Soner Onder (Principal Investigator) soner@mtu.edu
Recipient Sponsored Research Office:	Michigan Technological University 1400 TOWNSEND DR HOUGHTON MI US 49931-1200 (906)487-1885
Sponsor Congressional District:	01
Primary Place of Performance:	Michigan Technological University 1400 Townsend Drive Houghton MI US 49931-1295
Primary Place of Performance Congressional District:	01
Unique Entity Identifier (UEI):	GKMSN3DA6P91
Parent UEI:	GKMSN3DA6P91
NSF Program(s):	Software & Hardware Foundation, Exploiting Parallel&Scalabilty
Primary Program Source:	01001516DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7943, 9251
Program Element Code(s):	779800, 828300
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Title: XPS: Full: FP: Collaborative Research: Sphinx: Combining Data and Instruction Level Parallelism through Demand Driven Execution of Imperative Programs

It has become increasingly difficult to improve the performance of processors so that they can meet the demands of existing and emerging workloads. Recent emphasis has been towards enhancing the performance through the use of multi-core processors and Graphics Processing Units. However, these processors remain difficult to program and inflexible to adapt to dynamic changes in the available parallelism in a given program. Although the computer architecture and programming language community continues to innovate and make important gains towards better programmability and better designs, it remains that parallel programming is inherently costly and error prone, and automatic parallelization of programs is not always feasible or effective. The intellectual merits of this project are the development of a new program execution paradigm and the establishment of critical compiler and micro-architecture mechanisms so that one can design processors that can be easily programmed using existing programming languages and at the same time surpass the performance of existing parallel computers. The project's broader significance and importance are wide-spread: the deployment of such processors will push the limits of computation in every field of science and commerce.

The execution paradigm under consideration is a previously unexplored execution model, the demand-driven execution of imperative programs (DDE). The DDE paradigm rests on a solid theoretical framework and promises to efficiently deliver very high-levels of fine-grain parallelism. This parallelism is extracted from a program written in an imperative language such as C, and it is realized by means of an effective compiler-architecture collaboration mechanism using a common, single-assignment form for the program representation. DDE processors can extract instruction-level parallelism much more efficiently than existing superscalar processors as the paradigm does not require dynamic dependency checking. Such processors can fetch, buffer, and execute many more instructions in parallel than current superscalar processors. Owing to its dependence-driven instruction fetching and execution, the paradigm leads to extremely scalable designs, as the communication is naturally localized and synchronization is inherent in the model. Conventional thread-level parallelism (TLP) is orthogonal to DDE, and thus DDE designs can exploit both ILP and TLP. DDE architectures thus represent promising building blocks for extreme-scale machines.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jin, Zhaoxiang and Onder, Soner "A two-phase recovery mechanism." The 32nd ACM International Conference on Supercomputing. , 2018

Jin, Zhaoxiang and Onder, Soner "Dynamic Memory Dependence Predication" ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) , 2018 , p.235 10.1109/ISCA.2018.00029

Michael Stokes; Ryan Baird; Zhaoxiang Jin; David Whalley; Soner Onder "Improving Energy Efficiency by Memoizing Data Access Information" 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED) , 2019 10.1109/ISLPED.2019.8824951

M. Stokes, R. Baird, Z. Jin, D. Whalley, S. Onder "Decoupling Address Generation from Loads and Stores to Improve Data Access Energy Efficiency" ACM Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES) , 2018 10.1145/3211332.3211340

M. Stokes, R. Baird, Z. Jin, D. Whalley, S. Onder "Improving Energy Efficiency by Memoizing Data Access Information" ACM/IEEE International Symposium on Low Power Electronics and Design , 2019

Stokes, Michael and Baird, Ryan and Jin, Zhaoxiang and Whalley, David and Onder, Soner "Decoupling Address Generation from Loads and Stores to Improve Data Access Energy Efficiency" Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems , 2018 10.1145/3211332.3211340

Zhaoxiang Jin and Soner Önder "A two-phase recovery mechanism" International Conference on Supercomputing (ICS '18) , 2018 10.1145/3205289.3205300

Zhaoxiang Jin and Soner Önder "Dynamic memory dependence predication" The 45th Annual International Symposium on Computer Architecture (ISCA '18) , 2018 10.1109/ISCA.2018.00029

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Traditional computing relies on sequential processing of machine instructions which forms the basis for computing in every aspect of our lives. Further speeding up the execution of programs requires the use of parallelism and parallel execution of programs under the sequential execution model requires an extensive effort to develop and tune parallel programs, which are more prone to bugs and failures.

The goal of the project is to develop an alternative execution model called demand-driven execution of imperative programs. In this model, programs are automatically translated to an internal representation which permits executing them starting with their outputs progressing towards their inputs and computing only what is necessary and automatically in parallel. Our project has developed the model, and developed the compiler technology to convert imperative programs written in a conventional imperative programming language, such as C or C++. We have also developed the processor designs which can efficiently execute the transformed programs.

Since such a drastic change in program execution model requires the entire software stack to be developed, we cannot claim immediate and wide-spread use of this technology at this point. However, our project has demonstrated that we can automatically transform programs into this new paradigm and develop processors which can efficiently execute these programs. Further work on this approach may potentially provide significant speed-ups compared to conventional computing. Attached grpahs show the performance of our approach for a limited set of Livermore kernels which our compiler can compile.

Last Modified: 11/29/2020
Modified by: Soner Onder

Images (1 of 2)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error