NSF Award Search: Award # 1600669

Award Abstract # 1600669

CNS: CSR: Small: Runtime System, Architecture, and Technology Codesign Approach for Heterogeneous Many-Core Processors and Clusters

NSF Org:	CNS Division Of Computer and Network Systems
Recipient:	UNIVERSITY OF ILLINOIS
Initial Amendment Date:	October 14, 2015
Latest Amendment Date:	October 14, 2015
Award Number:	1600669
Award Instrument:	Standard Grant
Program Manager:	Marilyn McClure mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	August 21, 2015
End Date:	August 31, 2016 (Estimated)
Total Intended Award Amount:	$222,526.00
Total Awarded Amount to Date:	$222,526.00
Funds Obligated to Date:	FY 2012 = $222,526.00
History of Investigator:	Nam Sung Kim (Principal Investigator) nskim@illinois.edu
Recipient Sponsored Research Office:	University of Illinois at Urbana-Champaign 506 S WRIGHT ST URBANA IL US 61801-3620 (217)333-2187
Sponsor Congressional District:	13
Primary Place of Performance:	University of Illinois at Urbana-Champaign 1901 S. First St. Suite A Champaign IL US 61820-7473
Primary Place of Performance Congressional District:	13
Unique Entity Identifier (UEI):	Y8CWNJRCNN91
Parent UEI:	V2PHZ2CSCH63
NSF Program(s):	CSR-Computer Systems Research
Primary Program Source:	01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7923
Program Element Code(s):	735400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

The performance of computers has improved tremendously in the past four decades, which has enabled innumerable applications that have major roles in our daily lives. However, without dramatic innovations in improving performance and power efficiency of computing, the continued semiconductor device scaling alone will fail to provide computing capabilities needed for future applications. One of the main performance bottlenecks of traditional computing systems has been the high cost of communications between central processing unit (CPU) and graphics processing unit (GPU). The on-chip integration of CPU and GPU dramatically reduces the cost of communications, but it also worsens power, thermal, and bandwidth issues for chip design. Nonetheless, it also allows new approaches to be explored that previously were not practical. Given the potential and challenges of on-chip integrated CPU+GPU processors, this project undertakes a multidisciplinary effort to improve performance and power efficiency of computers. Specifically, the project aims to (i) develop runtime algorithms for scheduling workload and memory accesses under power, thermal, bandwidth constraints; (ii) explore micoarchitectures for improving memory system performance under bandwidth constraints; and (iii) optimize heterogeneous technology choices for integrated CPU+GPU processors.

This project is expected to have significant impact on the technology, circuit, architecture, and runtime system communities, and it is leading to state-of-the-art research infrastructure. The project also contributes state-of-the art workforce training. The outcomes of this project benefit economic growth through technology advances that will provide increased computing capability at a lower cost.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Daniel Wong, Nam Sung Kim, and Murali Annavaram "Approximating warps with intra-warp operand value similarity" IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2016

Hadi Asghari-Moghaddam, Young Hoon Son, Jung Ho Ahn, and Nam Sung Kim "Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems" IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2016

Hao Wang, Jie Zhang, Sharmila Shridhar, Gieseo Park, Myoungsoo Jung, Nam Sung Kim "DUANG: Lightweight page migration and adaptive asymmetry in memory systems" IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2016

Paula Aguilera, Dong Ping Zhang, Nuwan Jayasena and Nam Sung Kim "Fine-grained task migration for graph algorithms using processing in memory" Workshop on Advances in Parallel and Distributed Computational Models (APDCM) in Conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS) , 2016

Wayne Burleson, Shomit Das, Yasuko Eckert, and Nam Sung Kim "Heterogeneous computing ? a path to post-Moore supercomputing: architecture, circuits, and process" Post-Moore?s Era Supercomputing (PMES) Workshop in conjunction with International Conference for High Performance Computing, Networking, Storage and Analysis (SC) , 2016

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The computing industry is at a cross-roads today because scaling of silicon technology, which has been a major driving force for enabling high-performance computing, is rapidly approaching the fundamental limits. As Gordon Moore recently noted, “No exponential is forever, but can be only delayed.” On the other hand, the potentials of new emerging technologies are yet to be explored and these technologies are not mature enough to be economically used for mass production of computing devices. From an architecture point of view, a heterogeneous computing system comprised of heterogeneous processors such as CPU, GPU, and accelerators has emerged as a plausible practical solution to meet increasing performance demand under power, thermal, and bandwidth constraints.

Faced with such challenges and opportunities, in this project we aim to dramatically improve performance and energy-efficiency of heterogeneous computing systems with various techniques cutting across multiple levels of computing stacks (i.e., device, circuit, architecture, and runtime algorithm). Toward this goal, we first developed an architectural simulator to evaluate the performance of heterogeneous computing systems and released the simulator to the public (http://cpu-gpu-sim.ece.wisc.edu). Second, we developed a model that can evaluate the energy consumption of GPUs, which is an integral component of heterogeneous computing systems with two other collaborators from the University of Texas and University of British Columbia, and released it to the public (http://www.gpgpu-sim.org/gpuwattch). This energy model is currently the de facto model to evaluate the energy consumption of GPUs and the widely used model in the world, benefiting many researchers around the world. According to Google Scholar, more than 200 research papers have used this model. Third, based on these simulator and model we have developed many architecture and runtime algorithms to improve performance and energy efficiency of heterogeneous computing systems. For example, we developed a runtime algorithm that jointly adapts the operating voltage/frequency, the number of cores, and the workload allocated to the CPU and the GPU in a heterogeneous computing system. This algorithm allows us to maximize the performance under a given power constraint. We also developed various architectures that exploit the similarity of values that are processed by GPUs to further improve performance and energy efficiency while exploiting some unique characteristics of emerging applications running on heterogeneous computing systems. Moreover, we explored a practical but innovative heterogeneous computing architecture that moves computations near the memory considering the bandwidth constraint and energy efficiency, as it becomes more expensive to move data from memory to processors than processing data. This recent work, which significantly improves performance and energy consumption compared to traditional heterogeneous computing systems, has been widely cited by many recently published research papers in very a short time period.

Lastly, this project allowed us to support and train graduate students including under-represented ones. They are now working for large companies in the U.S.A. and contributing to developing the next-generation, state-of-the-art computers that will ensure the U.S.A. to maintain the leading position in computing and economy. Besides, the products of this project allowed us to enhance the contents of undergraduate and graduate-level computer architecture courses such that the students can learn state-of-the-art computing technologies and be better prepared for their jobs in industry and academia.

Last Modified: 11/29/2016
Modified by: Nam Sung Kim

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error