Award Abstract # 0910735
Keeneland: National Institute for Experimental Computing

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: GEORGIA TECH RESEARCH CORP
Initial Amendment Date: September 21, 2009
Latest Amendment Date: February 11, 2014
Award Number: 0910735
Award Instrument: Cooperative Agreement
Program Manager: Robert Chadduck
rchadduc@nsf.gov
 (703)292-2247
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2009
End Date: April 30, 2015 (Estimated)
Total Intended Award Amount: $12,000,000.00
Total Awarded Amount to Date: $12,000,000.00
Funds Obligated to Date: FY 2009 = $4,967,118.00
FY 2011 = $7,032,882.00
History of Investigator:
  • Jeffrey Vetter (Principal Investigator)
    vetter@tennessee.edu
  • Jack Dongarra (Co-Principal Investigator)
  • Karsten Schwan (Co-Principal Investigator)
  • Richard Fujimoto (Co-Principal Investigator)
  • Thomas Schulthess (Co-Principal Investigator)
Recipient Sponsored Research Office: Georgia Tech Research Corporation
926 DALNEY ST NW
ATLANTA
GA  US  30318-6395
(404)894-4819
Sponsor Congressional District: 05
Primary Place of Performance: Georgia Institute of Technology
225 NORTH AVE NW
ATLANTA
GA  US  30332-0002
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): EMW9FC8J3HN4
Parent UEI: EMW9FC8J3HN4
NSF Program(s): Innovative HPC,
CESER-Cyberinfrastructure for
Primary Program Source: 01000910DB NSF RESEARCH & RELATED ACTIVIT
01001112DB NSF RESEARCH & RELATED ACTIVIT

01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7619, 9215, HPCC
Program Element Code(s): 761900, 768400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Many-core processor architectures are rapidly emerging in many computing environments. One of their attractions is the ability to speed up significantly computation for certain classes of algorithm. In addition, they typically offer lower energy consumption per unit of computation. Several recent studies have identified the development of effective methods for efficiently programming many-core architectures as a major challenge. This project will make available, as experimental platforms, two systems in which one form of many-core processor with very high memory bandwidth, a graphics processing unit, is deployed at scale for use as an accelerator for high-performance parallel computing.

The Georgia Institute of Technology (Georgia Tech) and its partners, the University of Tennessee at Knoxville and the Oak Ridge National Laboratory will initially acquire and deploy a small, experimental, high-performance computing system consisting of an HP system with NVIDIA Tesla accelerators attached. This will be integrated into the TeraGrid. The project team will use this system to develop scientific libraries and programming tools to facilitate the development of science and engineering research applications. The project team will also provide consulting support to researchers who wish to develop applications for the system using OpenCL or to port applications to the system.

In 2012, the project will upgrade the heterogeneous system to a larger and more powerful system based on a next-generation platform and NVIDIA accelerators. It is anticipated that the final system will have a peak performance of roughly 2 petaflops/s. The project will operate the upgraded system as a TeraGrid resource for a further two years.

The final system has the potential to support many different science areas. Possible areas of impact include some of the scientific domains in which GPU-based acceleration has already been demonstrated to have an impact at smaller scale; for example, chemistry and biochemistry, materials science, atmospheric science and combustion science.

In addition to providing infrastructure for science and engineering research and education, the project partners will educate and train the next-generation of computational scientists on cutting-edge computing architectures and emerging programming environments, using the experimental computing resource as one example.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 262)
Agullo, Emmanuel and Augonnet, C{\'e}dric and Dongarra, Jack and Ltaief, Hatem and Namyst, Raymond and Thibault, Samuel and Tomov, Stanimire and others "Faster, cheaper, better--a hybridization methodology to develop linear algebra software for GPUs" GPU Computing Gems , v.2 , 2010
Ahmed, Lucky and Rasulev, Bakhtiyor and Turabekova, Malakhat and Leszczynska, Danuta and Leszczynski, Jerzy "Receptor- and ligand-based study of fullerene analogues: comprehensive computational approach including quantum-chemical, QSAR and molecular docking simulations" Organic & Biomolecular Chemistry , v.11 , 2013 , p.5798 10.1039/c3ob40878g
Amaro, Rommie E and Bansal, Manju "Editorial overview: Theory and simulation: Tools for solving the insolvable" Current Opinion in Structural Biology , v.25 , 2014 , p.iv--v 10.1016/j.sbi.2014.04.004
Anthony Danalis;Gabriel Marin;Collin McCurdy;Jeremy S. Meredith;Philip C. Roth;Kyle Spafford;Vinod Tipparaju;Jeffrey S. Vetter; "The Scalable Heterogeneous Computing (SHOC) benchmark suite" 3rd Workshop on General-Purpose Computation on Graphics Processing Units , 2010 , p.63-74
Anzt, Hartwig and Tomov, Stanimire and Dongarra, Jack and Heuveline, Vincent "A block-asynchronous relaxation method for graphics processing units" Journal of Parallel and Distributed Computing , v.73 , 2013 , p.1613--162
Anzt, Hartwig and Tomov, Stanimire and Gates, Mark and Dongarra, Jack and Heuveline, Vincent "Block-asynchronous multigrid smoothers for GPU-accelerated systems" Procedia Computer Science , v.9 , 2012 , p.7--16
Baboulin, Marc and Donfack, Simplice and Dongarra, Jack and Grigori, Laura and R{\'e}my, Adrien and Tomov, Stanimire "A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines" Procedia Computer Science , v.9 , 2012 , p.17--26
Bailey, Jon A. and Bazavov, A. and Bernard, C. and Bouchard, C. M. and DeTar, C. and Du, Daping and El-Khadra, A. X. and Foley, J. and Freeland, E. D. and G{\'a}miz, E. and et al. "B_{s}?D_{s}/B?D semileptonic form-factor ratios and their application to BR(B_{s}^{0}??^{+}?^{-})" Physical Review D , v.85 , 2012 10.1103/PhysRevD.85.114502
Chen, Eric and Swift, Robert V. and Alderson, Nazilla and Feher, Victoria A. and Feng, Gen-Sheng and Amaro, Rommie E. "Computation-Guided Discovery of Influenza Endonuclease Inhibitors" ACS Medicinal Chemistry Letters , v.5 , 2014 , p.61--64 10.1021/ml4003474
Bailey, Jon A. and Bazavov, A. and Bernard, C. and Bouchard, C. M. and DeTar, C. and Du, Daping and El-Khadra, A. X. and Foley, J. and Freeland, E. D. and G{\'a}miz, E. and et al. "Refining New-Physics Searches in B?D?? with Lattice QCD" Physical Review Letters , v.109 , 2012 10.1103/PhysRevLett.109.071802
Bailey, Jon A. and Bazavov, A. and Bernard, C. and Bouchard, C. M. and DeTar, C. and Du, Daping and El-Khadra, A. X. and Foley, J. and Freeland, E. D. and G{\~A}{!'}miz, E. and et al. "Update of from the form factor at zero recoil with three-flavor lattice QCD" Phys. Rev. D , v.89 , 2014 10.1103/physrevd.89.114504
(Showing: 1 - 10 of 262)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

 The Keeneland Project was a five-year cooperative agreement awarded by the National Science Foundation (NSF) in 2009 for the deployment of an innovative high performance computing system in order to bring emerging architectures to the open science community. The Georgia Institute of Technology (Georgia Tech) and its partners - Oak Ridge National Lab, University of Tennessee-Knoxville, and the National Institute for Computational Sciences - managed the facility, performed education and outreach activities for advanced architectures, developed and deployed software tools for this emerging class of architectures to ensure productivity, and teamed with early adopters to map their applications to Keeneland architectures.

 

In 2010, the Keeneland project procured and deployed its initial delivery system (KIDS): a 201 Teraflop, 120-node HP SL390 system with 240 Intel Xeon CPUs and 360 NVIDIA Fermi graphics processors, with the nodes connected by an InfiniBand QDR network. KIDS was being used to develop programming tools and libraries in order to ensure that the project can productively accelerate important scientific and engineering applications. The system was also available to a select group of users to port and tune their codes to a scalable GPU-accelerated system. In the spring of 2012 KIDS was upgraded from NVIDIA M2070 to M2090 GPUs for total peak performance of 255 TFLOPS.

 

In October of 2012, the Keeneland Full Scale (KFS) system was accepted by the NSF and went into production.  KFS was a 264-node cluster based on HP SL250 servers.  Each node had 32 GB of host memory, two Intel Sandy Bridge CPU’s, three NVIDIA M2090 GPUs, and a Mellanox FDR InfiniBand interconnect.  The total peak double precision performance was 615 TF. 

 

During its lifetime, the Keeneland project

 

  • Served 942 total users of KIDS and KFS for research, development and educational purposes;
  • Contributed to at least 367 publications, presentations, and other software artifacts;
  • Provided over 50 million CPU hours were used on KFS and over 10 million SUs were provided to XSEDE (equivalent to ~50 million SUs on a CPU-centric system);
  • Averaged 81.2% GPU utilization over Keeneland’s Full Scale 2-year production run;
  • Contributed to the development of many early software packages for GPU heterogeneous computing: Scalable Heterogeneous Computing (SHOC) Benchmarks, MAGMA BLAS libraries, Ocelot emulator, and many others;
  • Provided Advanced Application Support that increased GPU and system utilization and assisted with application development for BEAST/Beagle, improved batch matrix multiplication, and improvements to many large-scale applications; and,
  • Employed 10 staff members, four faculty members, and nearly 40 students and postdoctoral research associates.

Last Modified: 06/25/2015
Modified by: Jeffrey S Vetter