NSF Award Search: Award # 2028851

Award Abstract # 2028851

Collaborative Research: PPoSS: Planning: Unifying Software and Hardware to Achieve Performant and Scalable Zero-cost Parallelism in the Heterogeneous Future

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	NORTHWESTERN UNIVERSITY
Initial Amendment Date:	August 19, 2020
Latest Amendment Date:	July 8, 2022
Award Number:	2028851
Award Instrument:	Standard Grant
Program Manager:	Damian Dechev CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2020
End Date:	September 30, 2023 (Estimated)
Total Intended Award Amount:	$128,343.00
Total Awarded Amount to Date:	$144,343.00
Funds Obligated to Date:	FY 2020 = $128,343.00 FY 2021 = $16,000.00
History of Investigator:	Peter Dinda (Principal Investigator) pdinda@northwestern.edu Nikos Hardavellas (Co-Principal Investigator) Simone Campanoni (Former Co-Principal Investigator)
Recipient Sponsored Research Office:	Northwestern University 633 CLARK ST EVANSTON IL US 60208-0001 (312)503-7955
Sponsor Congressional District:	09
Primary Place of Performance:	Northwestern University 2233 Tech Drive Evanston IL US 60208-0001
Primary Place of Performance Congressional District:	09
Unique Entity Identifier (UEI):	EXZVPWZBLUE8
Parent UEI:
NSF Program(s):	PPoSS-PP of Scalable Systems, Software & Hardware Foundation
Primary Program Source:	01002021DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	026Z, 9251
Program Element Code(s):	042Y00, 779800
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Exploiting parallelism is essential to making full use of computer systems, and thus is intrinsic to most applications. Building parallel programs that can truly achieve the performance the hardware is capable of is extremely challenging even for experts. It requires a firm grasp of concepts that range from the very highest level to the very lowest, and that range is rapidly expanding. This project approaches this challenge along two lines, "theory down" and "architecture up". The first strives to simplify parallel programming through languages and algorithms. The second line strives to accelerate parallel programs through compilers, operating systems, and the hardware. The project's novelty is to bridge these two lines, which are usually treated quite distinctly by the research community. The unified team of researchers is addressing a specific subproblem, scheduling, and then determining how to expand out from it. The project's impact is in making it possible for ordinary programmers to program future parallel systems in a very high-level way, yet achieve the performance possible on the machine.

The project studies an "intermediate representation out" approach to making high-level parallel abstractions implementable so that they can be used with zero cost. A core idea is to expand the compiler's intermediate representation such that it can capture both high-level parallel concepts and low-level machine and operating system structures, thus allowing full stack optimization. This planning project will flesh out this concept and set the stage for a larger scale effort in the future.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Deiana, Enrico Armenio and Suchy, Brian and Wilkins, Michael and Homerding, Brian and McMichen, Tommy and Dunajewski, Katarzyna and Dinda, Peter and Hardavellas, Nikos and Campanoni, Simone "Program State Element Characterization" International Symposium on Code Generation and Optimization , 2023 https://doi.org/10.1145/3579990.3580011 Citation Details

Hale, Kyle C. "Coalescent Computing" Proceedings of the ACM Asia-Pacific Workshop on Systems (APSys 2021)) , 2021 https://doi.org/10.1145/3476886.3477503 Citation Details

Kandiah, Vijay and Lustig, Daniel and Villa, Oreste and Nellans, David and Hardavellas, Nikos "Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows" Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization , 2023 https://doi.org/10.1145/3579990.3580019 Citation Details

Matni, Angelo and Deiana, Enrico Armenio and Su, Yian and Gross, Lukas and Ghosh, Souradip and Apostolakis, Sotiris and Xu, Ziyang and Tan, Zujun and Chaturvedi, Ishita and Homerding, Brian and McMichen, Tommy and August, David I. and Campanoni, Simone "NOELLE Offers Empowering LLVM Extensions" 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) , 2022 https://doi.org/10.1109/CGO53902.2022.9741276 Citation Details

Rainey, Mike and Newton, Ryan R. and Hale, Kyle and Hardavellas, Nikos and Campanoni, Simone and Dinda, Peter and Acar, Umut A. "Task parallel assembly language for uncompromising parallelism" PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation , 2021 https://doi.org/10.1145/3453483.3460969 Citation Details

Wilkins, Michael and Westrick, Sam and Kandiah, Vijay and Bernat, Alex and Suchy, Brian and Deiana, Enrico Armenio and Campanoni, Simone and Acar, Umut A. and Dinda, Peter and Hardavellas, Nikos "WARDen: Specializing Cache Coherence for High-Level Parallel Languages" Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization , 2023 https://doi.org/10.1145/3579990.3580013 Citation Details

Zhang, Xiaochun and Jones, Timothy M. and Campanoni, Simone "Quantifying the Semantic Gap Between Serial and Parallel Programming" 2021 IEEE International Symposium on Workload Characterization (IISWC) , 2021 https://doi.org/10.1109/IISWC53511.2021.00024 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Project Outcomes Report

Parallel computing is indispensable to achieving the high computing performance necessary for science and engineering, as well as, more recently, machine learning/AI. Unfortunately achieving high performance and efficiency can be extremely challenging and typically requires specialists. This is only expected to get worse with the advent of heterogeneous architectures. The main purpose of this planning project was to begin to address this problem.

This project provided a framework for integrating teams from Northwestern, Illinois Institute of Technology, and Carnegie Mellon to develop a large-scale proposal to NSF’s PPoSS program. The overall concept was to combine the “theory down” and “architecture up” approaches to parallelism that are exemplified by the teams, and to explore what it would mean to develop an “IR-out” approach, centered around a compilation framework. This “IR-out” (intermediate representation) approach would target future heterogeneous parallel machines and attempt to simultaneously democratize programming of such machines while still achieving expert-level performance.

In addition to providing the ability to plan together, including in several workshops, the project also supported the collaborative teams ability to start work on several technical challenges together. The most visible of these is in heartbeat scheduling, which is an approach to modulating the amount of available parallelism in a program, one that has strong theoretical bounds. In the joint work, the team developed (a) two different compilation technologies (one based on assembly-level transforms, the other based on IR-level transforms) to enable heartbeat scheduling with minimal programmer effort, and (b) special kernel support (both a custom kernel, and a Linux kernel module) to provide the necessary runtime heartbeat signals effectively.

Significant training opportunities resulted from the project. 7 Ph.D. students (4 NU, 2 IIT, 1 CMU), 1 MS student (IIT), 4 REU students (2 NU, 2 IIT), and 1 other undergraduate (IIT) were involved in the project. Several of the MS students and undergraduates joined Ph.D. programs. The project’s three workshops each included about 30 attendees, providing greater exposure for students at every site, but particularly for Northwestern, which hosted each one and thus could easily send a broad range of students at all levels. Northwestern’s undergraduate operating systems course was entirely redesigned, leveraging the Nautilus kernel framework to create labs. A Northwestern graduate course in kernel and other low-level software development was created, and has so far trained over 100 students in this esoteric topic. A Northwestern course on compiler analysis and transformation was created, and Northwestern’s compiler sequence was also considerably enhanced.

The results of this project were very promising, including multiple joint publications and software, much of which is publicly available. The collaborative team was also successful in having the planned-for large-scale project supported by the NSF (with an additional element funded by DOE), and the resulting Constellation Project is now highly active.

Last Modified: 11/26/2023
Modified by: Peter A Dinda

Please report errors in award information by writing to: awardsearch@nsf.gov.

Project Outcomes Report

Success

Error