Award Abstract # 1361053
Collaborative Research:XPS:CLCCA: Performance Portable Abstractions for Large-Scale Irregular Computations

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: GEORGIA TECH RESEARCH CORP
Initial Amendment Date: November 6, 2013
Latest Amendment Date: November 6, 2013
Award Number: 1361053
Award Instrument: Standard Grant
Program Manager: Anindya Banerjee
abanerje@nsf.gov
 (703)292-7885
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 10, 2013
End Date: August 31, 2018 (Estimated)
Total Intended Award Amount: $299,999.00
Total Awarded Amount to Date: $299,999.00
Funds Obligated to Date: FY 2013 = $299,999.00
History of Investigator:
  • Srinivas Aluru (Principal Investigator)
    aluru@cc.gatech.edu
Recipient Sponsored Research Office: Georgia Tech Research Corporation
926 DALNEY ST NW
ATLANTA
GA  US  30318-6395
(404)894-4819
Sponsor Congressional District: 05
Primary Place of Performance: Georgia Institute of Technology
225 North Avenue
Atlanta
GA  US  30332-0002
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): EMW9FC8J3HN4
Parent UEI: EMW9FC8J3HN4
NSF Program(s): Information Technology Researc,
Exploiting Parallel&Scalabilty
Primary Program Source: 01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 164000, 828300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Ongoing technology trends are accelerating scientific discovery by allowing researchers to generate enormous quantities of data, in domains ranging from computational biology to social networks. There is an urgent need to make it easy and fast to extract useful content from this data using appropriate abstractions and parallel runtimes. Work conducted under this project aims to make "big data" computing more readily available to applications with dynamic structure and irregular dependencies, thereby enabling advances in scientific computing in general and computational biology in particular.

This project extends the state of the art in scientific computing by developing programming abstractions to expose -- and run-time optimizations to exploit -- the parallelism available in large, irregular applications. Parallelism is essential for the extraction of useful information from ever increasing volumes of scientific data, but the irregularity of data structure and access in many problem domains makes efficient parallelization difficult. At the level of the programming model, the project addresses the challenge of irregularity by identifying design patterns for important new classes of applications -- in particular, those that use trees and graphs for data representation and access but demonstrate some structure in the traversal. At the level of the run-time system, it is developing computational engines that support and exploit the new patterns, leveraging the structure exposed to automatically and dynamically map computational tasks to hardware nodes.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Joseph Izraelevitz, Hammurabi Mendes, and Michael L. Scott "Preserving Happens-before in Persistent Memory" 28th ACM Symposium on Parallelism in Algorithms and Architectures , 2016 10.1145/2935764.2935810
Matthew Graichen, Joeseph Izraelevitz, and Michael L. Scott "An Unbounded Nonblocking Double-ended Queue" International Conference on Parallel Processing , 2016 10.1109/ICPP.2016.32
Sharanyan Srikanthan, Sandhya Dwarkadas, and Kai Shen "Coherence Stalls or Latency Tolerance: Informed CPU Scheduling for Socket and Core Sharing" Usenix Annual Technical Conference (Usenix ATC). , 2016
Tony C. Pan, Sanchit Misra, Srinivas Aluru "Optimizing High Performance Distributed Memory Parallel Hash Tables for DNA k-mer Counting" The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18) , 2018
Tony Pan, Patrick Flick, Chirag Jain, Yongchao Liu, and Srinivas Aluru "Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems" IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2017 10.1109/TCBB.2017.2760829
Tony Pan, Patrick Flick, Chirag Jain, Yongchao Liu, and Srinivas Aluru "Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems" The 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics , 2016 , p.422-433 10.1145/2975167.2975211
Tony Pan, Rahul Nihalani, and Srinivas Aluru "Fast De Bruijn Graph Compaction in Distributed Memory Environments" IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2018 10.1109/TCBB.2018.2858797
Xiaowan Dong, Sandhya Dwarkadas, Alan L. Cox "Shared Address Translation Revisited" The European Conference on Computer Systems (EuroSys) , 2016

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project aimed to extend the state-of-the-art in scientific computing by developing programming abstractions to expose -- and run-time optimizations to exploit -- the parallelism available in large, irregular applications. Parallelism is essential for the extraction of useful information from ever increasing volumes of scientific data, but the irregularity of data structure and access in many problem domains makes efficient parallelization difficult. At the level of the programming model, the project addressed the challenge of irregularity by identifying design patterns for important new classes of applications -- in particular, those that use trees and graphs for data representation and access but demonstrate some structure in the traversal. At the level of the run-time system, the project developed computational engines that support and exploit the new patterns, leveraging the structure exposed to automatically and dynamically map computational tasks to hardware nodes.

Project activities were focused in three major areas: 1) fast point tree data structures and algorithms, 2) fast de Bruijn graph and algorithms for sequence analysis and assembly, and 3) efficient parallel run-time systems.

The Georgia Institute of Technology team led the development of fast point tree data structures and algorithms to support tree-based applications including hierarchical methods in scientific computing, k-nearest neighbors and the Fast Multipole Method. A common framework is developed, including three sub libraries: the tree compute framework, the parallel tree library model/abstraction, and a library of distributed tree structure implementations (such as octrees and kd-trees). The first two sub libraries provide a common abstraction for constructing, accessing, and computing with different distributed tree structures, while the last sub-library contain tree-specific, state-of-the-art algorithms.  These abstractions allow an application developer to perform computations in parallel with minimal knowledge in parallelism or underlying distributed computing resources. The code is demonstrated by implementing communication efficient kd-trees, showing a two- to three-fold improvement in performance over existing methods. 

The Georgia Institute of Technology team also led the development of efficient data structures and algorithms to support genome assembly as well as other sequence analysis tasks such as error corrrection, alignment, variant calling, etc. Best-in-class algorithms were developed for distributed memory k-mer indices (Kmerind and Kmerhash), and parallel directed and bi-directed de Bruijn graphs (Bruno) with fast construction, error detection and removal, linear chain compaction, and cycle detection.  The k-mer counter out-performed previous state-of-the-art shared memory software by 1.7 times, and was able to count a 1TB sequence data set in under 12 seconds using 4096 cores on NERSC's Cori supercomputer.  The de Bruijn graph software constructed and compacted a 695GB data set in 31 seconds using 7680 cores on NERSC's Edison supercomputer, 3.7 times faster than the previous state of the art, and 1.4 times faster than the previous state of the art in shared memory environments.  These algorithms and implementations benefit not only computational biology and bioinformatics, but also the broader set of applications that depend on distributed and sequential hash tables.

The University of Rochester team led the development of efficient parallel run-time systems.  To make as effective use as possible of modern multiprocessors and clusters, techniques were developed to co-locate application tasks that share significant amounts of data, thereby minimizing communication costs. Related techniques were developed to automatically control the level of parallelism in each application. Both of these mechanisms leverage the performance monitoring features of modern processors to effect their adaptations without programmer or user intervention.For systems that share hardware resources across independent applications, the project also developed driver software thatguarantees fair access to extremely fast parallel I/O devices.

In the area of speculative execution, techniques were developed to maximize the effectiveness of hardware transactional memory (HTM), which allows groups of instructions to be executed as a single indivisible operation. One such technique uses HTM as a prefetching mechanism that allows an application to maximize performance when an operation for which it is waiting finally completes; another technique minimizes data structure indirection, thereby reducing the odds of spurious HTM failures. Several new concurrent data structures were also developed, together with techniques that leverage emerging memory technologies to maintain the consistency of structured data across power outages and other system failures.

The work performed under this project has broad impact both as a whole as well as indiviudal components.  Tree abstractions as well as kd-trees and octrees are commonly used in scientific computing and computer science.  De Bruijn graph and k-mer indices are useful for genomic assembly and sequence analysis, while the underlying distributed and sequential hash tables, vectorized hash functions, and primitives to support communication and computation overlaps have broad applicability.  The improvements to core OS funcitonalities such as CPU scheduling, address translation, data persistance, and hardware transactional memory support impact OS as well as sequential and parallel application performance. Multiple software products from the research are made available as open source.


Last Modified: 02/27/2019
Modified by: Srinivas Aluru

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page