NSF Award Search: Award # 1337145

Award Abstract # 1337145

XPS: CLCCA (XPS: DSD) Future Extreme Scale Frameworks using DSL and ERTS

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	UNIVERSITY OF UTAH
Initial Amendment Date:	September 11, 2013
Latest Amendment Date:	September 11, 2013
Award Number:	1337145
Award Instrument:	Standard Grant
Program Manager:	Anindya Banerjee abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 15, 2013
End Date:	August 31, 2017 (Estimated)
Total Intended Award Amount:	$700,000.00
Total Awarded Amount to Date:	$700,000.00
Funds Obligated to Date:	FY 2013 = $700,000.00
History of Investigator:	Martin Berzins (Principal Investigator) mb@cs.utah.edu Matthew Might (Co-Principal Investigator) James Sutherland (Co-Principal Investigator)
Recipient Sponsored Research Office:	University of Utah 201 PRESIDENTS CIR SALT LAKE CITY UT US 84112-9049 (801)581-6903
Sponsor Congressional District:	01
Primary Place of Performance:	University of Utah UT US 84112-9023
Primary Place of Performance Congressional District:	01
Unique Entity Identifier (UEI):	LL8GLEVH6MG3
Parent UEI:
NSF Program(s):	Exploiting Parallel&Scalabilty
Primary Program Source:	01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	9150
Program Element Code(s):	828300
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

This project will lay the foundations for solving large multi-scale multi-physics engineering problems using new computational frameworks on the next decade's exascale computers. These frameworks will anticipate trends in proposed future computer hardware by being adaptive, asynchronous, fault-tolerant, and energy-aware and will address the possible billion-way parallelism of exascale systems by both generating efficient specific code through a domain specific language (DSL) and by efficiently scheduling that code through an Exascale Run Time System, ERTS.

The project's prototype computational framework will use Directed Acyclic Graphs (DAGs) to generate code in the DSL that is specialized for multicore nodes and/or accelerators and it will also use DAGS at the runtime system level in the ERTS. At a nodal multicore or accelerator level the DSL will be a declarative, high-level, type-safe domain-specific language for multi-scale, multi-physics simulations that will improve productivity by automating the writing of optimized code that is executed by the ERTS. In order to ensure that these activities produce relevant solutions for important engineering applications with high impact such as, for example, the design of new batteries and fuel cells or clean coal boilers, the project will employ a multi-disciplinary approach that couples computer science advances to the solution of such meaningful engineering problems.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Michael Adams, Celeste Hollenbeck and Matthew Might "On the Complexity and Performance of Parsing with Derivatives." Proceedingsof the 37th Annual Conference of Programming Language Design andImplementation (PLDI 2016). , 2016 , p.224 10.1145/2908080.2908128

T.Saad, J.C. Sutherland "Comment on Diffusion by a Random Velocity Field" Physics of Fluids , v.28 , 2016 DOI: 10.1063/1.4968528

Yonkee, N., & Sutherland, J. C "PoKiTT: Exposing Task and Data Parallelism on Heterogeneous Architectures Detailed Chemical Kinetics, Transport, and Thermodynamics Calculations" SIAM Journal on Scientific Computing , v.38 , 2016 , p.s264-s281 https://doi.org/10.1137/15M1026237

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This research has addressed three aspects of the challenges that arise from the need to solve complex science and engineering problems on parallel computer architectures. These aspects reflect the expertise of the three investigators concerning the use of software that makes it possible for scientists to solve problems at a much higher and more abstract level on larger and more complex parallel computers.

The high-level specification of such applications is done through the use of domain specific languages (DSLs) that allow problem specification in a way that is close to the applications area, both being areas of expertise of Dr. James Sutherland. The design of DSL’s raises fundamental computer language questions which were addressed by Dr. Matt Might. Once a problem has been specified through a DSL it is necessary to execute the DSL tasks in an efficient and fail-safe manner using a runtime system on parallel computers. The design of these runtime systems is the research area of Dr. Martin Berzins.

Dr. Might’s group used template meta-programming which allows the C++ language to generate code at compile time. They looked at the theoretical and practical limits of template meta-meta-programming for C++ and found that C++ template meta-programming resembles a first-class term-rewriting system and that means that these programs can be much more better expressed in terms of higher-order term-rewriting systems and this makes it easier to encode the core logic necessary to do template-meta-programming and DSL design and implementation.

This research used Honeycomb, which extends lexical scoping and nesting in C++ template meta-programming for Nebo/Fulmar which is the main C++ template meta-programming DSL of the project. The research showed that Honeycomb could generate stencil physics code and also undertook fundamental research on the complexity of parsing techniques used for DSLs. This research also provided a much better and stronger idea of how to construct good domain specific languages for stencil operations using C++ template meta-programming.

These results in turn influenced the development of PoKiTT: a library for Portable Kinetics, Thermodynamics and Transport calculations. PoKiTT replaces much of the functionality provided by Cantera (a library at MIT) but with kernels that provide better performance in serial and add multicore and GPU kernels for reacting flow calculations. Speedups on 16 cores were approximately 10x and 40x for GPU over single core The dense linear algebra kernels provide speedups on a K20 GPU of 8-15x for linear solves and 15-30x for eigenvalue decomposition relative to single core performance execution for reacting flow calculations.

In the area of runtime systems the decomposition of software into a programming model that generates a task-graph for execution by a runtime system makes portability and performance possible, but is challenging in key areas such as support for data structures, tasks on heterogeneous architectures, performance portability, power management and designing for resilience. In heterogeneous machines, accelerator tasks pose significant challenges over their CPU task counterparts particularly when tasks need communicated values and compute within a few milliseconds. Current and emerging heterogeneous architectures necessitate addressing these challenges within Uintah. The principal result was to identify and address inefficiencies that arise when mapping tasks onto the GPU, to implement new schemes to reduce runtime system overhead, to introduce new features that allow for more tasks to leverage on-node accelerators, and to show nodal performance results from these improvements.

Finally in the area of making such task-based calculations more resilient the research used coarse replication of fine-mesh patches on a different compute node. This adds only about 12.5% overhead for three dimensional mesh blocks. However, in order to avoid introducing an unacceptable degree of error it is necessary for the interpolation process to recreate the fine mesh to preserve physical solution properties such as positivity of chemical concentrations. Such interpolants were not available and so had to be discovered and implemented together with the use of existing fault tolerant tools for message passing. As a result ithere is now a prototype approach for dealing with resilience issues at exascale. Finally the project successfully demonstrated these approaches on two very recent and very different computers, the new NSF Stampede 2 computer at TACC in Austin and the current fastest machine in the world the Sunway Tiahulight at Wuxi China.

Of equal importance to these technical results is the training and development outcome of this project wiith regard to those who worked on it. The research that undergraduate and graduate students have undertaken has helped prepare them for their future careers. In particular 4 MS students undertook research before graduating to go and work for industry and 7 students undertook research that was part of their Ph.D.s or led to them doing a Ph.D. As a result of this activity one undergraduate thesis resulted and seven journal or conference papers were published with two more submitted or to be submitted.

Last Modified: 11/03/2017
Modified by: Martin Berzins

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error