
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | November 6, 2013 |
Latest Amendment Date: | November 6, 2013 |
Award Number: | 1361053 |
Award Instrument: | Standard Grant |
Program Manager: |
Anindya Banerjee
abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 10, 2013 |
End Date: | August 31, 2018 (Estimated) |
Total Intended Award Amount: | $299,999.00 |
Total Awarded Amount to Date: | $299,999.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
926 DALNEY ST NW ATLANTA GA US 30318-6395 (404)894-4819 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
225 North Avenue Atlanta GA US 30332-0002 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Information Technology Researc, Exploiting Parallel&Scalabilty |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Ongoing technology trends are accelerating scientific discovery by allowing researchers to generate enormous quantities of data, in domains ranging from computational biology to social networks. There is an urgent need to make it easy and fast to extract useful content from this data using appropriate abstractions and parallel runtimes. Work conducted under this project aims to make "big data" computing more readily available to applications with dynamic structure and irregular dependencies, thereby enabling advances in scientific computing in general and computational biology in particular.
This project extends the state of the art in scientific computing by developing programming abstractions to expose -- and run-time optimizations to exploit -- the parallelism available in large, irregular applications. Parallelism is essential for the extraction of useful information from ever increasing volumes of scientific data, but the irregularity of data structure and access in many problem domains makes efficient parallelization difficult. At the level of the programming model, the project addresses the challenge of irregularity by identifying design patterns for important new classes of applications -- in particular, those that use trees and graphs for data representation and access but demonstrate some structure in the traversal. At the level of the run-time system, it is developing computational engines that support and exploit the new patterns, leveraging the structure exposed to automatically and dynamically map computational tasks to hardware nodes.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project aimed to extend the state-of-the-art in scientific computing by developing programming abstractions to expose -- and run-time optimizations to exploit -- the parallelism available in large, irregular applications. Parallelism is essential for the extraction of useful information from ever increasing volumes of scientific data, but the irregularity of data structure and access in many problem domains makes efficient parallelization difficult. At the level of the programming model, the project addressed the challenge of irregularity by identifying design patterns for important new classes of applications -- in particular, those that use trees and graphs for data representation and access but demonstrate some structure in the traversal. At the level of the run-time system, the project developed computational engines that support and exploit the new patterns, leveraging the structure exposed to automatically and dynamically map computational tasks to hardware nodes.
Project activities were focused in three major areas: 1) fast point tree data structures and algorithms, 2) fast de Bruijn graph and algorithms for sequence analysis and assembly, and 3) efficient parallel run-time systems.
The Georgia Institute of Technology team led the development of fast point tree data structures and algorithms to support tree-based applications including hierarchical methods in scientific computing, k-nearest neighbors and the Fast Multipole Method. A common framework is developed, including three sub libraries: the tree compute framework, the parallel tree library model/abstraction, and a library of distributed tree structure implementations (such as octrees and kd-trees). The first two sub libraries provide a common abstraction for constructing, accessing, and computing with different distributed tree structures, while the last sub-library contain tree-specific, state-of-the-art algorithms. These abstractions allow an application developer to perform computations in parallel with minimal knowledge in parallelism or underlying distributed computing resources. The code is demonstrated by implementing communication efficient kd-trees, showing a two- to three-fold improvement in performance over existing methods.
The Georgia Institute of Technology team also led the development of efficient data structures and algorithms to support genome assembly as well as other sequence analysis tasks such as error corrrection, alignment, variant calling, etc. Best-in-class algorithms were developed for distributed memory k-mer indices (Kmerind and Kmerhash), and parallel directed and bi-directed de Bruijn graphs (Bruno) with fast construction, error detection and removal, linear chain compaction, and cycle detection. The k-mer counter out-performed previous state-of-the-art shared memory software by 1.7 times, and was able to count a 1TB sequence data set in under 12 seconds using 4096 cores on NERSC's Cori supercomputer. The de Bruijn graph software constructed and compacted a 695GB data set in 31 seconds using 7680 cores on NERSC's Edison supercomputer, 3.7 times faster than the previous state of the art, and 1.4 times faster than the previous state of the art in shared memory environments. These algorithms and implementations benefit not only computational biology and bioinformatics, but also the broader set of applications that depend on distributed and sequential hash tables.
The University of Rochester team led the development of efficient parallel run-time systems. To make as effective use as possible of modern multiprocessors and clusters, techniques were developed to co-locate application tasks that share significant amounts of data, thereby minimizing communication costs. Related techniques were developed to automatically control the level of parallelism in each application. Both of these mechanisms leverage the performance monitoring features of modern processors to effect their adaptations without programmer or user intervention.For systems that share hardware resources across independent applications, the project also developed driver software thatguarantees fair access to extremely fast parallel I/O devices.
In the area of speculative execution, techniques were developed to maximize the effectiveness of hardware transactional memory (HTM), which allows groups of instructions to be executed as a single indivisible operation. One such technique uses HTM as a prefetching mechanism that allows an application to maximize performance when an operation for which it is waiting finally completes; another technique minimizes data structure indirection, thereby reducing the odds of spurious HTM failures. Several new concurrent data structures were also developed, together with techniques that leverage emerging memory technologies to maintain the consistency of structured data across power outages and other system failures.
The work performed under this project has broad impact both as a whole as well as indiviudal components. Tree abstractions as well as kd-trees and octrees are commonly used in scientific computing and computer science. De Bruijn graph and k-mer indices are useful for genomic assembly and sequence analysis, while the underlying distributed and sequential hash tables, vectorized hash functions, and primitives to support communication and computation overlaps have broad applicability. The improvements to core OS funcitonalities such as CPU scheduling, address translation, data persistance, and hardware transactional memory support impact OS as well as sequential and parallel application performance. Multiple software products from the research are made available as open source.
Last Modified: 02/27/2019
Modified by: Srinivas Aluru
Please report errors in award information by writing to: awardsearch@nsf.gov.