
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | August 10, 2011 |
Latest Amendment Date: | July 2, 2014 |
Award Number: | 1111888 |
Award Instrument: | Continuing Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2011 |
End Date: | July 31, 2016 (Estimated) |
Total Intended Award Amount: | $1,100,000.00 |
Total Awarded Amount to Date: | $1,100,000.00 |
Funds Obligated to Date: |
FY 2014 = $288,059.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
107 S INDIANA AVE BLOOMINGTON IN US 47405-7000 (317)278-3473 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
107 S INDIANA AVE BLOOMINGTON IN US 47405-7000 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
01001415DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The most powerful computing systems in the world have historically been dedicated to solving scientific problems. Until recently, the computations performed by these systems have typically been simulations of various physical phenomena. However, a new paradigm for scientific discovery has been steadily rising in importance, namely, data-intensive science, which focuses sophisticated analysis techniques on the enormous (and ever increasing) amounts of data being produced in scientific, commercial, and social endeavors. Important research based on data-intensive science include areas as diverse as knowledge discovery, bioinformatics, proteomics and genomics, data mining and search, electronic design automation, computer vision, and Internet routing. Unfortunately, the computational approaches needed for data-intensive science differ markedly from those that have been so effective for simulation-based supercomputing. To enable and facilitate efficient execution of data-intensive scientific problems, this project will develop a comprehensive hardware and software supercomputing system for data-intensive science.
Graph algorithms and data structures are fundamental to data-intensive computations and, consequently, this project is focused on providing fundamental, new understandings of the basics of large-scale graph processing and how to build scalable systems to efficiently solve large-scale graph problems. In particular, this work will characterize processing overheads and the limits of graph processing scalability, develop performance models that properly capture graph algorithms, define the (co-design) process for developing graph-specific hardware, and experimentally verify our approach with a prototype execution environment. Key capabilities of our system include: a novel fine-grained parallel programming model, a scalable library of graph algorithms and data structures, a graph-optimized core architecture, and a scalable graph execution platform. The project will also address the programming challenges involved in constructing scalable and reliable software for data-intensive problems.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Modeling and simulation, an important tool for discovery and computation, has long been recognized as the “third pillar” of scientific research. More recently, data analytic computing has emerged as a distinct form of computing and a distinct form of research likely to be as important as, if not more important than, modeling and simulation.
The focus of traditional scientific computing has been in solving systems of PDEs (and the correspond- ing linear algebra problems that they induce). Hardware architectures, computer systems, and software platforms have evolved together to efficiently support solving these kinds of problems. Similar attention has not been devoted to efficiently solving data analytics problems. However, the executive order that created the National Strategic Computing Initiative (NSCI) specifically calls out data analytics as one of its five objectives:
Increasing coherence between the technology base used for modeling and simulation and that used for data analytic computing.
Within the broad domain of data analytics, the graph abstraction is a powerful conceptual tool that describes the relationships between discrete objects. Graphs are used in areas such as social network analysis, machine learning, compilers, electronic design automation, planning, and operations research. Other areas of scientific computing use graphs as well, in the guise of sparse matrices (often derived from structured or unstructured meshes).
In this project, we have advanced the knowledge and understanding of graph computation in several ways. We developed implementations of graph algorithms in modern distributed runtimes AM++ and HPX-5, resulting in PBGL 2 and PXGL libraries. We investigated new asynchronous mode of graph computation called Distributed Control, where we avoid underutilization of resources by eliminating global barriers and data structures, minimizing the global interaction between threads of computation to “termination detection.” To achieve performance, such approach requires rethinking and redesign of existing algorithms, and it requires the right choice of order in which work is executed. Furthermore, asynchronous algorithms introduce the aspect of interaction with the underlying runtime. We have proposed that the run time is an integral part of a graph processing system, and that new research results should fully acknowledge this fact, increasing the reporting on the run time related aspects of performance. We have thoroughly analyzed interaction between AM++ and HPX run times and our algorithms in the context of this project.
We have also considered theoretical developments for graph computation. We proposed Abstract Graph Machines (AGMs) as a high level framework for describing data-driven graph algorithms. AGMs capture graph computation as independent tasks that can be ordered in some way that impacts the semantics and the performance of the algorithms they describe. An Extended AGM (EAGM) expands the notion of ordering onto an abstraction of the underlying architecture, allowing description and analysis of performance improvements possible by placing non-semantic ordering at different levels of hardware.
We also considered new abstractions for graph libraries. As a part of this project, we have participated in the development and specification of GraphBLAS, a set of functions that extend the ideas form linear algebra to graphs. In the GraphBLAS approach, a graph is viewed as a matrix, most often the adjacency matrix, and it is processed through a series of BLAS-like operations. GraphBLAS selects a small set of useful BLAS operations and extends them with parametrization by the underlying semiring. This extension allows customization of the traditional BLAS methods to the tasks of a particular graph algorithm. We have developed a prototype library called GraphBLAS Template Library, and we have investigated seamless mapping of the GraphBLAS interface to different backends such as CPU and GPU.
Several software artifacts were developed in this project. These artifacts are released as open source and are available to the public under non-restrictive license. Examples of projects developed are the PXGL library, the PBGL 2 library, the GBTL library, and the AM++ and HPX-5 runtimes. This project also contributed to standardization efforts (GraphBLAS). We have produced several publications on different aspects of graph computation, and we have participated in standardization and development of new ideas and software interfaces. Last but not least, this project directly supported four PhD students over its course who completed some, if not the bulk of their graduate work under this project.
Last Modified: 11/02/2016
Modified by: Andrew Lumsdaine
Please report errors in award information by writing to: awardsearch@nsf.gov.