NSF Award Search: Award # 1657976

Award Abstract # 1657976

Workshop on Architecture and Software for Emerging Applications (WASEA)

NSF Org:	CNS Division Of Computer and Network Systems
Recipient:	TEXAS A&M ENGINEERING EXPERIMENT STATION
Initial Amendment Date:	October 25, 2016
Latest Amendment Date:	May 16, 2019
Award Number:	1657976
Award Instrument:	Standard Grant
Program Manager:	Matt Mutka CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	November 1, 2016
End Date:	October 31, 2019 (Estimated)
Total Intended Award Amount:	$49,900.00
Total Awarded Amount to Date:	$49,900.00
Funds Obligated to Date:	FY 2017 = $49,900.00
History of Investigator:	Lawrence Rauchwerger (Principal Investigator) rwerger@illinois.edu Nancy Amato (Former Co-Principal Investigator)
Recipient Sponsored Research Office:	Texas A&M Engineering Experiment Station 3124 TAMU COLLEGE STATION TX US 77843-3124 (979)862-6777
Sponsor Congressional District:	10
Primary Place of Performance:	Texas A&M Engineering Experiment Station IS
Primary Place of Performance Congressional District:
Unique Entity Identifier (UEI):	QD1MX6N5YTN4
Parent UEI:	QD1MX6N5YTN4
NSF Program(s):	CSR-Computer Systems Research
Primary Program Source:	01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7556
Program Element Code(s):	735400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

High-valued domain applications in areas such as medicine,
biology, physics, engineering, and social phenomena demand
both fast innovation and high execution speed and require
productive development environments for domain experts who
may not be computer science experts. This workshop brings together leading researchers in architecture,
compilers and programming languages, and domain experts to discuss
and debate potential approaches to accelerating progress in such
high-valued domains with an emphasis on developing strategies for
exploiting machine learning, including strategies for accelerating
learning algorithms through parallelism. The goal is to stimulate
an in-depth discussion of the potential benefits of joint architecture
and compiler approaches. The workshop will promote broadening
participation by including speakers from groups underrepresented
in computing and early career researchers.

The workshop will produce a report providing recommendations on:
joint compiler/language and architecture approaches, compiler/language
support enabling more aggressive hardware capabilities, architecture
support enabling more effective compilers, and applications whose
development process could benefit by these advances, The report
will identify research opportunities in the interaction between
developers, and architecture and language/compiler researchers to
enable productive domain application development and highly efficient
and scalable implementation on heterogeneous computing systems. The
report will outline promising approaches and the research required for
these approaches to become usable by the domain application developers.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Intellectual Merit

Bringing together some of best researchers to discuss domain specific applications and their software and architecture environment may lead to new approaches to solving this difficult problem. The development of faster, more capable machine learning environments as well as better and faster graph processing capabilities may greatly increase our capabilities to study and optimize problems in a wide range: from physics and engineering to social phenomena.

Broader Impact

The development of better tools for using machine learning and graph algorithms can affect such areas as medicine, biology, social studies, etc. It can grant us access to a wealth of data that is being collected but it is not being used yet.

Outcome

High-valued domain applications such as image recognition demand both fast innovation and high execution speed. For example, the methods for image recognition using deep neural networks evolve very quickly and require productive development environments for domain experts who may not be computer science experts. Geoffrey Hinton’s team won the 2012 ImageNet competition by training a deep neural network with 1.2 million images. Since then, many new algorithm innovations have been proposed for significant improvement over the version from Hinton’s team. Furthermore, the success in applying deep neural networks to image recognition has ignited a lot of research on applying deep neural networks to other areas such as speech recognition and natural language process, where traditional approaches have not been successful for decades.

At the same time, the research process for these methods require fast turnaround time of experiments that involve training deep neural networks using millions of images. Training of deep convolution neural networks involves fine-grained parallel computation within each layer but is constrained by data dependencies from one layer to the next. This is why traditional scale-out schemes based on clusters have not been successful. The best hardware for the training process has been tightly coupled multi-GPU systems that support extremely fast synchronization among a very large number of cooperating fine grained threads. Even with these systems, each such experiment can take weeks to complete. The creation of an effective algorithm can take many rounds of experiments. Dramatically more efficient implementation of candidate algorithms and thus much faster turnaround time of experiments can have a significant impact on the progress of the field.

The state of the art in developing and implementing high-valued domain applications is based on application frameworks. For example, in machine learning, developers typically build on library frameworks such as Caffe, Torch, and TensorFlow. The library functions of these frameworks are implemented for CPUs, GPUs, and other accelerators. However, in the current practice, it takes tremendous amount of effort and time to bring up the implementation of an existing training method for a new compute device. It takes even more effort and latency to introduce a new method that needs to be implemented for the existing devices. Individual kernels in each method needs to be hand-optimized for a new architecture. Efficient arrangement of the execution of kernels with respect to each other in a methods to take advantage of the memory hierarchy and interconnect capabilities requires even more effort.

Experience from the CUDA and Heterogeneous System Architecture (HSA) shows that architecture support can significantly reduce the cost of implementing application functions on heterogeneous parallel devices. Unified address space, user-level command queues, re-optimizable intermediate representations, and coherent shared memory are among the frequently cited system architecture features that reduce the barrier of implementation. However, little has been done in the architecture of the compute devices to explicitly lower the barrier of implementation.

On the compiler side, much progress has been made in C++ and OpenCL compilation to provide efficient code optimization, scheduling, and generation for a given algorithm. However, there has been little work on the support of specifying alternative algorithms that can achieve the same application-level results but different levels of efficiency on different hardware types and hierarchies. Some recent work on DSL have shown promise in image processing. However, one needs to wrestle with the burden on the developers to learn multiple languages. A more generic extension to the existing C++ or Python language to allow more compiler support may be a more realistic long-term solution.

The idea of the workshop is to bring together architecture and compiler researchers to discuss and debate on the potential approaches to accelerating the innovation in high-valued domain applications. The goal is to stimulate an in-depth discussion of what each can potentially achieve and how much more joint architecture and compiler approaches could accomplish. The product of the workshop is a report on the recommendations on each approach based on in-depth discussions of leading researchers from compilers, architecture, and important application domains.

Last Modified: 03/27/2020
Modified by: Lawrence Rauchwerger

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error