Award Abstract # 0711134
A National Institute for Computational Sciences to Provide Leading-Edge Computational Support for Breakthrough Science and Engineering Research

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF TENNESSEE
Initial Amendment Date: September 28, 2007
Latest Amendment Date: September 20, 2016
Award Number: 0711134
Award Instrument: Cooperative Agreement
Program Manager: Edward Walker
edwalker@nsf.gov
 (703)292-4863
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2007
End Date: June 30, 2017 (Estimated)
Total Intended Award Amount: $64,442,172.00
Total Awarded Amount to Date: $84,527,696.00
Funds Obligated to Date: FY 2007 = $38,131,952.00
FY 2008 = $7,788,584.00

FY 2009 = $18,528,956.00

FY 2011 = $6,497,365.00

FY 2012 = $6,725,518.00

FY 2014 = $3,874,530.00

FY 2015 = $2,980,790.00
History of Investigator:
  • Gregory Peterson (Principal Investigator)
    gdp@utk.edu
  • Thomas Zacharia (Former Principal Investigator)
Recipient Sponsored Research Office: University of Tennessee Knoxville
201 ANDY HOLT TOWER
KNOXVILLE
TN  US  37996-0001
(865)974-3466
Sponsor Congressional District: 02
Primary Place of Performance: University of Tennessee Knoxville
201 ANDY HOLT TOWER
KNOXVILLE
TN  US  37996-0001
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): FN2YCS2YAUW3
Parent UEI: LXG4F9K8YZK5
NSF Program(s): Climate & Large-Scale Dynamics,
COMPUTATIONAL PHYSICS,
XD-Extreme Digital,
Innovative HPC,
Leadership-Class Computing
Primary Program Source: 0100999999 NSF RESEARCH & RELATED ACTIVIT
01000809DB NSF RESEARCH & RELATED ACTIVIT

01000910DB NSF RESEARCH & RELATED ACTIVIT

0100999999 NSF RESEARCH & RELATED ACTIVIT

01001011DB NSF RESEARCH & RELATED ACTIVIT

01001112DB NSF RESEARCH & RELATED ACTIVIT

01001213DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT

01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7476, 7619, 9150, 9215, HPCC
Program Element Code(s): 574000, 724400, 747600, 761900, 778100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

0711134: University of Tennessee - Knoxville
PI: Thomas Zacharia

ABSTRACT

In this project, the University of Tennessee at Knoxville (UTK) will provide a significant new computing capability to the research community. It will provide the capability for researchers to tackle large and complex research challenges in a wide range of areas. In partnership with the Oak Ridge National Laboratory (ORNL), UTK will acquire, deploy and operate a sequence of large, well balanced, high-performance computational resources on behalf of the science and engineering research community. Initially, a large teraflop/s Cray XT4 system will be deployed. This will subsequently be upgraded to a Cray Baker system with a peak performance of over one petaflop/s and a large amount of main memory. These systems will be sited at the Joint Institute for Computational Sciences, a center established by the University of Tennessee and the Oak Ridge National Laboratory (ORNL), housed in a building constructed by the state of Tennessee on the ORNL campus. The new systems will form part of NSF's TeraGrid high-performance cyberinfrastructure, doubling the computational capacity of the TeraGrid in one year.

This award will permit investigators across the country to conduct innovative research in a number of areas. Examples of recent impacts of the TeraGrid's high performance computing resources in research, taken from NSF's "Highlights" database, include:

. The first atomic level simulation of a life form. These simulations of the satellite tobacco mosaic virus will help scientists determine what factors are important to the virus' structural integrity and how those factors might influence assembly of the virus inside host cells.
. Avalanches, oil spills, thunderstorm fronts, and the dust cloud following a building collapse all generate heavier fluid intrusions into a lighter environment. Mathematical modeling and large scale simulations give engineers the means to study these threedimensional flows, which are frequently immeasurable due to their destructive power.
. Over the last year, scientists have used more than one million CPU hours on TeraGrid systems to optimize the process by which the signals generated by Higgs decay are separated from the potentially overwhelming background noise.
. One of the best tools economists have to account for the vagaries of human decision-making as it affects economic forecasting is the life cycle model. Using a process called "backward induction," for the first time, made it possible to apply massively parallel computing to the life cycle model. TeraGrid systems were used to solve the largest, most realistically specified versions of the life cycle model ever attempted.
. Proteins are the building blocks of the body, and biologists have learned that the myriad ways they function from fighting off infection and building new bones to storing a memory depend on the precise details of their 3 D shapes. But determining the shapes of proteins has been a slow and exacting process. By dramatically accelerating scientific research, modern supercomputers are opening the door to medical advances such as rational drug design.

This project manifests broader impacts in a number of areas.
. The project enhances the infrastructure for research and education through the provision of facilities for high-end computing integrated into the TeraGrid. Integration into the TeraGrid permits the facility to be used, relatively transparently, to provide back-end services to the education portals within the TeraGrid Science and Engineering Gateways.
. The project partners at Oak Ridge National Laboratory, the Texas Advanced Computing Center, and the National Center for Atmospheric Research will collaborate to develop and offer advanced training in high-end computing topics including scaling and performance optimization. This training program will include in-person training at remote institutions where there are at least ten attendees. Training sessions will also be provided at large science and engineering workshops and conferences.
. Leveraging funding from a variety of sources, the University of Tennessee at Knoxville will launch a multidisciplinary Intercollegiate Graduate Program in Computational Science that will offer training in computational science to students in a wide range of disciplines.
. Many of the users of the proposed system will be graduate students and post-doctoral researchers working in research groups that use high-end computing in their investigations. It is anticipated that the computational resources provided by this project will play a role in over one hundred graduate theses.
. The project will work with the Oak Ridge Associated Universities' Council of Minority-Serving Institutions to recruit a diverse group of users from underrepresented groups.

One of the primary partners in this project, ORNL, operates a portfolio of existing programs aimed at introducing pre-college and college students to the science and engineering uses of highend computing and at broadening the participation of underrepresented groups in science and engineering. The types of research conducted with the new system will be integrated into these programs.


PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hartwig Anzt, Moritz Kreutzer, Eduardo Ponce, Gregory D. Peterson, Gerhard Wellein, and Jack Dongarra "Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs" International Journal of High Performance Computing Applications , 2016 10.1177/1094342016646844

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In 2009, the National Institute for Computational Sciences (NICS) delivered the first academic petaflop computer to the NSF community—a Cray XT5 called Kraken. By the end of 2010, two Cray systems at NICS, Kraken and the 166 TF Cray XT4 Athena, were primary providers of computer time to the TeraGrid, delivering more than 70% of all NSF compute cycles. In 2011 NICS decommissioned Athena, after providing 99% availability and 93% system utilization. In 2014 NICS decommissioned Kraken, and began providing the NSF community access to Darter, a 250 TF Cray XC-30.

Kraken was a Cray XT5 consisting of 9,408 compute nodes, each containing two 6-core AMD Istanbul Opteron processors and 16 GB of on-node memory. The resulting 112,896 compute cores delivered 1.17 PF at peak performance with 147 TB of memory. Communications took place over the Cray SeaStar2+ interconnect. A parallel Lustre file system provided 3.3 PB (raw) of short-term data storage.

Athena was a Cray XT4 with 4512 compute nodes interconnected with SeaStar, a 3D torus. Each compute node had one four-core AMD Opteron for a total of 18,048 cores. All nodes had 4 Gbytes of memory: 1 Gbyte of memory per core. A parallel Lustre file system provided 100 TB (raw) of short-term data storage.

The High Performance Storage System (HPSS) is capable of archiving hundreds of petabytes of data and can be accessed by all major leadership computing platforms. Incoming data was written to disk and later migrated to tape for long term archiving. NICS users stored over 14PB of data in HPSS by the end of the Kraken project.

From April 2008 until its decommissioning, Kraken delivered more than 4 billion core-hours of computing to 3.8 million jobs and maintained an average uptime (availability) of 96 percent.

 The U.S. was world leader in computer simulations from the expansion of the NSF's open-science capability in the mid-1980s until the early 2000s when Japan came forth with a simulator that was an order of magnitude greater than anything the U.S. had installed at the time. However, NSF's Kraken, in combination with the Department of Energy's former Jaguar supercomputer, helped restore the global preeminence of the U.S. in computer simulations. Kraken, together with Jaguar, then the world's fastest computer for open scientific research, made ORNL the most powerful computing complex on the planet, with more than two petaflops of power under one roof.

 For a period of time, Kraken provided more than 60 percent of the allocated compute cycles in the TeraGrid portfolio of approximately a dozen resource providers. Even to retirement, Kraken remained one of NSF's most highly used systems—from August 2008 through March 2014, it supplied an average of 43 percent of all allocated compute cycles for TeraGrid/XSEDE. Moreover, users consistently were able to run across the entire machine with high efficiency.

 Before the deployment of the Blue Waters supercomputer at the University of Illinois, Kraken fulfilled the role of capability computing resource. Capability users are those who effectively use a system up to its limits.

 NICS, Kraken's managing organization, instituted an innovation to enable the optimal performance of its operational mission of providing the maximum number of total compute cycles to the scientific community while enabling full machine runs for capability users. The innovation, called bimodal scheduling, entails a forced “draining” of the system on a weekly basis, followed by consecutive full machine runs. Implementation of bimodal scheduling led to utilization of more than 90 percent—the equivalent of a 300-plus teraflop supercomputer, or several million dollars of compute time a year. Average utilization during the course of Kraken's life was an exceptional 92 percent. Bimodal scheduling was the brainchild of the late Phil Andrews, the first director of NICS.

Kraken was a series of Cray XT systems that culminated in a 112,896-core XT5 system with a peak performance of 1.17 petaflops (1,174 teraflops), 147 terabytes of memory, and 2.4 petabytes of dedicated formatted disk space. Each evolutionary milestone was delivered on scope, schedule, and budget.

Kraken entered full production mode on Feb. 2, 2009, with a speed of 607 teraflops, or 607 trillion calculations per second. In the latter part of that year, Kraken became only the fourth open supercomputer ever to perform a petaflop, or 1,000 trillion calculations per second.

 Kraken held the distinction of being the world's most powerful computer managed by academia and was the third fastest on the Top500 list in November 2009. As of November 2013, it was still ranked number 35.

 The first academic computer to break the petaflop barrier of more than a quadrillion floating-point operations per second, Kraken enabled researchers in myriad scientific and engineering domains—from physics to molecular biology, atmospheric sciences, climate, and many others—to achieve advances that prior academic computing resources lacked the power to support.


Last Modified: 12/06/2017
Modified by: Gregory D Peterson

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page