Award Abstract # 2005632
Category I: Anvil - A National Composable Advanced Computational Resource for the Future of Science and Engineering

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: PURDUE UNIVERSITY
Initial Amendment Date: May 29, 2020
Latest Amendment Date: June 7, 2024
Award Number: 2005632
Award Instrument: Cooperative Agreement
Program Manager: Robert Chadduck
rchadduc@nsf.gov
 (703)292-2247
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2020
End Date: September 30, 2027 (Estimated)
Total Intended Award Amount: $9,952,154.00
Total Awarded Amount to Date: $29,426,684.00
Funds Obligated to Date: FY 2020 = $11,942,583.00
FY 2021 = $10,449,761.00

FY 2022 = $2,038,429.00

FY 2023 = $48,000.00

FY 2024 = $4,947,910.00
History of Investigator:
  • Xiaohui Carol Song (Principal Investigator)
  • Preston Smith (Co-Principal Investigator)
  • Arman Pazouki (Co-Principal Investigator)
  • Rajesh Kalyanam (Co-Principal Investigator)
  • Xiao Zhu (Former Co-Principal Investigator)
Recipient Sponsored Research Office: Purdue University
2550 NORTHWESTERN AVE # 1100
WEST LAFAYETTE
IN  US  47906-1332
(765)494-1055
Sponsor Congressional District: 04
Primary Place of Performance: Purdue University
155 South Grant Street
West Lafayette
IN  US  47907-2114
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): YRXVL4JYCEF5
Parent UEI: YRXVL4JYCEF5
NSF Program(s): Innovative HPC
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT

01002425DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7619, 9102, 9251
Program Element Code(s): 761900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

As computing permeates nearly all fields of science and engineering, there is an exponential growth of computing needs from both the traditional computing-intensive domains and the emerging new and more diverse fields of research. The rise of machine learning and artificial intelligence applications has accelerated and broadened the use of computational resources from research in creating new and more environmentally friendly materials to improving medicine in our fight against deadly diseases. There are three main challenges to meeting this rapidly evolving landscape of national computational needs: a shortage of capacity, increasingly diverse applications, and computational literacy and training. This project aims to meet these challenges and transform the way computing is delivered by developing and deploying a composable advanced computing resource, Anvil, to the national research community to significantly increase both the computing capacity and accessibility. Anvil integrates a large-capacity high-performance computing (HPC) cluster with a comprehensive ecosystem of software, access interfaces, programming environments, and composable services to form a seamless environment able to support a broad range of current and future science and engineering applications. Through a carefully designed student training program and partnerships with regional and other universities, XSEDE, and Women in HPC programs, this project will develop computing competency in the next-generation workforce, and engage and train a broader audience including underrepresented students at minority-serving and EPSCoR (Established Program to Stimulate Competitive Research) institutions.

Built with a forward-looking architecture with a high core count, and improved memory bandwidth and I/O, Anvil can effectively support traditional HPC with fast turnaround for high throughput, mid-scale computation jobs. Anvil consists of 1000 128-core computing nodes based on the next-generation AMD Epyc ?Milan" architecture that can deliver a total peak performance of 5.3 Petaflops. Each node has 256 GB of memory, and a 100 gigabits/second bandwidth from the Mellanox HDR InfiniBand interconnect, allowing multiple jobs of up to 1024 cores to be run at full speed over the interconnect fabric. These nodes are complemented by 32 large-memory nodes with 1 TB of RAM each, and 16 Nvidia GPU nodes with 4 ?Volta Next? GPUs per node. The GPU nodes are capable of 1.57 petaflops of single-precision performance to support machine learning and a wide range of current and future science and engineering applications. Anvil?s multiple tiers of storage systems include a long-term archive, persistent file and campaign storage, a 10 PB scratch file system, a 3 PB flash burst buffer, and object storage to support a variety of workflows and storage needs.

Anvil will lower the barrier to entry to advanced computing CI by providing interactive computing and desktop environments that ease the transition for users from diverse domains new to HPC. By providing feature-rich interactive environments such as Open OnDemand and ThinLinc, users can rapidly become productive on Anvil through Linux and Windows desktops, or familiar tools through their browser (e.g., Jupyter, RStudio). Complex scientific software environments and application stacks will be supported via containers orchestrated within a powerful composable subsystem. Anvil supports cloud-bursting of computational workloads as well as use of public cloud machine learning platforms including GPU and FPGA accelerators and software tools to automate hyperparameter tuning and algorithm selection for exploratory ML research. An existing production-quality science gateway at Purdue will support XSEDE researchers to share their data and tools online and facilitate easy access to Anvil and other XSEDE resources in classroom instruction and training activities.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chaudhry, Shafaq and Pazouki, Arman and Schmitz, Patrick and Hillery, Elizabett and Kee, Kerk "Understanding Factors that Influence Research Computing and Data Careers" Practice and Experience in Advanced Research Computing (PEARC22) , 2022 https://doi.org/10.1145/3491418.3530292 Citation Details
Song, Carol X. and Merwade, Venkatesh and Wang, Shaowen and Witt, Michael and Kumar, Vipin and Irwin, Elena and Zhao, Lan and Walton, Amy "Cyberinfrastructure for sustainability sciences" Environmental Research Letters , v.18 , 2023 https://doi.org/10.1088/1748-9326/acd9dd Citation Details
Song, X. Carol and Smith, Preston and Kalyanam, Rajesh and Zhu, Xiao and Adams, Eric and Colby, Kevin and Finnegan, Patrick and Gough, Erik and Hillery, Elizabett and Irvine, Rick and Maji, Amiya and St. John, Jason "Anvil - System Architecture and Experiences from Deployment and Early User Operations" PEARC '22: Practice and Experience in Advanced Research Computing , 2022 https://doi.org/10.1145/3491418.3530766 Citation Details
Wu, Tsai-Wei and Lien Harrell, Stephen and Lentner, Geoffrey and Younts, Alex and Weekly, Sam and Mertes, Zoey and Maji, Amiya and Smith, Preston and Zhu, Xiao "Defining Performance of Scientific Application Workloads on the AMD Milan Platform" Practice and Experience in Advanced Research Computing , 2021 https://doi.org/10.1145/3437359.3465596 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page