Award Abstract # 1337177
XPS:CLCCA: Optimizing Heterogeneous Platforms for Unstructured Parallelism

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: GEORGIA TECH RESEARCH CORP
Initial Amendment Date: September 11, 2013
Latest Amendment Date: September 11, 2013
Award Number: 1337177
Award Instrument: Standard Grant
Program Manager: Anindya Banerjee
abanerje@nsf.gov
 (703)292-7885
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 15, 2013
End Date: August 31, 2017 (Estimated)
Total Intended Award Amount: $735,055.00
Total Awarded Amount to Date: $735,055.00
Funds Obligated to Date: FY 2013 = $735,055.00
History of Investigator:
  • Sudhakar Yalamanchili (Principal Investigator)
    sudha@ece.gatech.edu
  • Richard Vuduc (Co-Principal Investigator)
  • Hyesoon Kim (Co-Principal Investigator)
Recipient Sponsored Research Office: Georgia Tech Research Corporation
926 DALNEY ST NW
ATLANTA
GA  US  30318-6395
(404)894-4819
Sponsor Congressional District: 05
Primary Place of Performance: Georgia Institute of Technology
225 North Avenue, NW
Atlanta
GA  US  30332-0002
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): EMW9FC8J3HN4
Parent UEI: EMW9FC8J3HN4
NSF Program(s): Exploiting Parallel&Scalabilty
Primary Program Source: 01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 828300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Major social and economic change is being driven by the emergence of "big data." In all sectors of the economy businesses are increasingly relying on the ability to extract useful intelligence from massive relational data sets. Emergent applications are characterized by data intensive computation where massive parallelism is increasingly unstructured, hierarchical, workload dependent, and time varying. At the same time, energy and power considerations are driving computer architecture towards massively parallel heterogeneous organizations such as multithreaded CPUs tightly integrated with bulk synchronous parallel (BSP) architectures such as general-purpose graphics processing units (GPUs). This evolution driven by energy efficiency concerns has had a disruptive impact on modern software stacks challenging our ability to extract the performance necessary to deal with big data. We need to develop computing technologies that can harness the throughput potential of energy efficient heterogeneous architectures for emergent applications processing massive relational data sets.

Realizing the potential of massively-parallel heterogeneous architectures is inhibited by the unstructured dynamic parallelism exhibited by applications in these domains. This research develops a suite of coordinated algorithm, compiler, and microarchitecture technologies that effectively exploits dynamic parallelism. The suite of techniques enables the effective navigation of the tradeoffs between parallelism, locality, and data movement to realize optimized high performance implementations. First, the proposed program utilizes the language of sparse linear algebra to formulate algorithms to expose massive unstructured parallelism. Second, this formulation drives new compiler and run-time system optimizations tailored to the computational characteristics of these emergent applications and heterogeneous hardware. Third, at the microarchitecture level we propose new memory hierarchy management techniques tailored to exploiting dynamic parallelism. The integrated solutions (algorithm, compiler/run-time, and microarchitecture) are demonstrated on commodity platforms and delivered in the form of an open source software stack to support and enable community wide research efforts. For U.S. businesses to exploit the new capabilities of heterogeneous architectures and systems for emerging applications, it is essential to both create new technology and employees with the necessary skills to utilize these technologies. Technology transfer and workforce impact will be promoted through the NSF Industry University Cooperative Research Center on Experimental Research in Computer Systems (CERCS, www.cercs.gatech.edu) at Georgia Tech with members such as Intel, IBM, HP, and AMD as well as application oriented companies such as LogicBlox and Intercontinental Commodity Exchange (ICE) and also Department of Energy National laboratories such as Sandia and Oak Ridge. Similar impacts are expected through the NVIDIA Center of Excellence at Georgia Tech.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, Hyesoon Kim "GraphPIM: Enabling Instruction-Level PIMOffloading in Graph Computing Frameworks" 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) , 2017 10.1109/HPCA.2017.54

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Societies are defined by their dependencies - social, economic, governmental, and personal. Data intensive applications focused on relationships across these dependencies are distinct from scientific computations. Further, within the sciences, the processing of relationships represented as graph structures is forming an important subset of scientific computation. The improvement in the performance of relational computations over massive data sets has the potential to change the way we do business or chart scientific discovery in those instances where the limiting factor is the sheer volume and diversity of data, such that the lack of computational throughput for relational computations limits our current ability to aggregate information or discover relationships across the data sets. The information that we extract from such relationships are becoming a central component of the way we operate businesses, pursue research, and manage our day to day lives. The form of these computations (e.g., relational, unstructured, etc.) and massive scale of these data sets introduces new algorithmic, intellectual, and engineering challenges. This research advances new algorithmic and engineering solutions to the problems of the analysis of relationships over massive data sets using advanced, massively parallel architectures. In particular, we address the use of heterogeneous architectures wherein high throughput processors (general purpose graphics processing units (GPGPUs)) are coupled with mainstream homogeneous multicore processors.

The major intellectual outcomes of this program are several and focus on new relational data sets structured as graphs – a dominant data structure in modern relational computation. Graphs are popular data structures for representing social networks, biological networks, transport networks, etc. However, they also create unstructured computations and memory referencing behavior that leads to low performance. First, we performed a comprehensive analysis of these emergent applications to understand their dynamic behaviors in how their computations are structured and how they use memory. The specific behavior is referred to as dynamic parallelism where the magnitude of concurrent computations is time-varying and data dependent.  The resulting insights led to the development of a new computational model for executing these applications and modifications to modern high performance GPGPU accelerators to enable their implementation on commodity heterogeneous processors. Second, the presence of massive data sets stresses the memory system. We focused on the implications of massive data sets and the associated applications on the management of the memory system. The unique properties of these applications led to the formulation of new models for managing memory (consistency models) and efficiently utilizing time-varying parallelism over data sets (synchronization models). Third, we addressed the algorithmic challenges presented by new relational data sets structured as graphs This program developed new algorithms for partitioning such structures on modern high performance parallel machines. At its core is a new approach to partitioning streaming graph data structures across hundreds – thousands of parallel processors, for example machines used in large scale scientific computation. These contributions represent a cross-cutting effort from algorithms, through execution models, to architecture implementations.

The engineering contributions of this program translated the preceding intellectual contributions into open source software artifacts to benefit the larger research and development community and enable further developments. These include i) two new application benchmark suites for graph processing and dynamic parallelism, ii) new simulation models for processor architectures that can support dynamic parallelism, and iii) software for partitioning and processing large scale graph data on modern high performance computing machines.

Collectively, the preceding intellectual and engineering contributions realize advances in the state of the art and provides scaffolding to continue to build on these controbutions.

 


Last Modified: 12/21/2017
Modified by: Sudhakar Yalamanchili

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page