Award Abstract # 1445604
High Performance Computing System Acquisition: Jetstream - A Self-Provisioned, Scalable Science and Engineering Cloud Environment

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: TRUSTEES OF INDIANA UNIVERSITY
Initial Amendment Date: November 20, 2014
Latest Amendment Date: October 15, 2020
Award Number: 1445604
Award Instrument: Cooperative Agreement
Program Manager: Robert Chadduck
rchadduc@nsf.gov
 (703)292-2247
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: December 1, 2014
End Date: November 30, 2022 (Estimated)
Total Intended Award Amount: $6,576,101.00
Total Awarded Amount to Date: $14,496,404.00
Funds Obligated to Date: FY 2015 = $6,576,101.00
FY 2016 = $5,260,880.00

FY 2017 = $503,729.00

FY 2018 = $1,363,219.00

FY 2019 = $48,000.00

FY 2020 = $744,475.00
History of Investigator:
  • David Hancock (Principal Investigator)
    dyhancoc@iu.edu
  • Ian Foster (Co-Principal Investigator)
  • Matthew Vaughn (Co-Principal Investigator)
  • Nirav Merchant (Co-Principal Investigator)
  • James Taylor (Co-Principal Investigator)
  • Craig Stewart (Former Principal Investigator)
Recipient Sponsored Research Office: Indiana University
107 S INDIANA AVE
BLOOMINGTON
IN  US  47405-7000
(317)278-3473
Sponsor Congressional District: 09
Primary Place of Performance: Indiana University
509 E. 3rd St.
Bloomington
IN  US  47401-3654
Primary Place of Performance
Congressional District:
09
Unique Entity Identifier (UEI): YH86RTW2YVJ4
Parent UEI:
NSF Program(s): CYBERINFRASTRUCTURE,
XD-Extreme Digital,
Innovative HPC,
Data Cyberinfrastructure
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT

01001516DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT

01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7619, 7433, 9251, 116E
Program Element Code(s): 723100, 747600, 761900, 772600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

High Performance Computing System Acquisition: Jetstream - a self-provisioned, scalable science and engineering cloud environment

Jetstream will be a new type of computational research resource open for the national (nonclassified) research community - a data analysis and computational resource that US scientists and engineers will use interactively to conduct their research anytime, anywhere. Jetstream will complement current NSF-funded computational resources and bring a cloud-based system to the NSF computational resources incorporating the best elements of commercial cloud computing resources with some of the best software in existence for solving important scientific problems. This system will enable many US researchers and engineers to make new discoveries that are important to understanding the world around us and will help researchers make new discoveries that improve the quality of life of American citizens.

In terms of technical details, Jetstream will be a configurable large-scale computing resource that leverages both on-demand and persistent virtual machine technology to support a much wider array of software environments and services than current NSF resources can accommodate. As a fully configurable "cloud" resource, Jetstream bridges the obvious major gap in the current ecosystem, which has machines targeted at large-scale High-Performance Computing, high memory, large data, high-throughput, and visualization resources. As the open cloud for science, Jetstream will:

*Provide "self-serve" academic cloud services, enabling researchers or students to select a VM image from a published library, or alternatively to create or customize their own virtual environment for discipline- or task-specific personalized research computing.

*Host persistent VMs to provide services beyond the command line interface for science gateways and other science services. For example, Jetstream will become a primary host of the popular Galaxy scientific workbench and its main datasets, bringing many Galaxy users to the NSF ecosystem from day one.

*Enable new modes of sharing computations, data, and reproducibility.

*Expand access to the NSF XSEDE ecosystem by making virtual desktop services accessible from institutions with limited resources

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 117)
Hernandez-Patlan, Daniel and Solis-Cruz, Bruno and Adhikari, Bishnu and Pontin, Karine P and Latorre, Juan D and Baxter, Mikayla F A and Hernandez-Velasco, Xochitl and Merino-Guzman, Ruben and M{\'e}ndez-Albores, "Evaluation of the antimicrobial and intestinal integrity properties of boric acid in broiler chickens infected with Salmonella enteritidis: Proof of concept" Res. Vet. Sci. , v.123 , 2018 , p.7--13
Adhikari, Bishnu and Hernandez-Patlan, Daniel and Solis-Cruz, Bruno and Kwon, Young Min and Arreguin, Margarita A. and Latorre, Juan D. and Hernandez-Velasco, Xochitl and Hargis, Billy M. and Tellez-Isaias, Guillermo "Evaluation of the Antimicrobial and Anti-inflammatory Properties of Bacillus-DFM (Norum) in Broiler Chickens Infected With Salmonella Enteritidis" Frontiers in Veterinary Science , v.6 , 2019 , p.282 10.3389/fvets.2019.00282
Afgan, E., Christie, M., Goonasekera, N. "CloudLaunch as a Gateway for Discovering and Launching Cloud Applications." Proceedings of Gateways 2017 , 2017 https://doi.org/10.6084/m9.figshare.5471800.v1
Afgan, E., Lonie, A., Taylor, J., Goonasekera, N. "CloudLaunch: Discover and Deploy Cloud Applications" Future Generation Computer Systems , 2018
Afgan, Enis and Baker, Dannon and Batut, B{\'{e}}r{\'{e}}nice and van den Beek, Marius and Bouvier, Dave and {\v{C}}ech, Martin and Chilton, John and Clements, Dave and Coraor, Nate and Gr{\"{u}}ning, Bj{\"{o}}rn A and Guerler, Aysam and Hillman-Jackson, "{The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update}" Nucleic Acids Research , v.46 , 2018 , p.W537--W54 10.1093/nar/gky379
Afgan, Enis and Lonie, Andrew and Taylor, James and Goonasekera, Nuwan "CloudLaunch: Discover and deploy cloud applications" Future Generation Computer Systems , v.94 , 2019 , p.802--810
Allen, William J. and Gabr, Refaat E. and Tefera, Getaneh B. and Pednekar, Amol S. and Vaughn, Matthew W. and Narayana, Ponnada A. "{Platform for Automated Real-Time High Performance Analytics on Medical Image Data}" IEEE Journal of Biomedical and Health Informatics , v.22 , 2018 , p.318--324 10.1109/JBHI.2017.2771299
Allen, W.J., Gabr, R.E., Tefera, G.B., Pednekar, A.S., Vaughn, M.W., & Narayana, P.A. "Platform for Automated Real-Time High Performance Analytics on Medical Image Data" IEEE Journal of Biomedical and Health Informatics , v.22 , 2018 https://doi.org/10.1109/JBHI.2017.2771299
Alonge, Michael and Soyk, Sebastian and Ramakrishnan, Srividya and Wang, Xingang and Goodwin, Sara and Sedlazeck, Fritz J and Lippman, Zachary B and Schatz, Michael C "RaGOO: fast and accurate reference-guided scaffolding of draft genomes" Genome biology , v.20 , 2019 , p.1--17
Ameri, Mina and Honka, Elisabeth and Xie, Ying "A Model of Tie Formation, Product Adoption, and Content Generation" Product Adoption, and Content Generation , 2019
Arora, Rohit and Burke, Harry M. and Arnaout, Ramy "Immunological Diversity with Similarity" bioRxiv , 2018 10.1101/483131
(Showing: 1 - 10 of 117)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Indiana University and Jetstream partners fundamentally changed the landscape of the National Science Foundation cyberinfrastructure (CI) ecosystem through the Jetstream project. The Jetstream project served as a first-of-a-kind environment as a production cloud system, while simultaneously acting as a pilot deployment, positively influencing the NSF Advanced Computing Systems and Services (ACSS) program that followed. Jetstream provided on-demand computing and storage anytime, anywhere. By design and in practice, Jetstream was created as programmable cyberinfrastructure in that users could reconfigure the system to suit their needs; this took many forms, from multi-user platforms that leveraged Jetstream to configure their services, to single individuals who utilized Matlab interactively. Jetstream was a very different system from other XSEDE resources, a highly-usable system with a core focus on those in the long tail of science. 

Jetstream provided services directly to 18,714 researchers and educators, including 8,836 students from June 2016 through July 2022 as part of 1,220 projects in 69 fields of science for individuals at 399 institutions during this time period. The 63 science gateways that utilized Jetstream indirectly supported over 183,197 people over the life of the system, most without even knowing their use of the resource. As a training and instructional resource Jetstream was used seven times more than any other XSEDE resource in terms of Educational Allocations. Over the life of Jetstream, 216 courses and workshops utilized Jetstream as part of their instruction. These activities included multi-day workshops for early career researchers, continuing education for researchers, semester long courses for undergraduate and graduate students, and semester-long capstone projects for masters level students. Forty-six courses were conducted multiple times on Jetstream.

The Jetstream environment was accepted under OpenStack Liberty and retired with the Rocky release on CentOS 7.8. Overall, the system went through seven major OpenStack releases without regularly scheduled downtime or unscheduled interruptions. Through the six years of operations Jetstream had an overall availability of 98.54% including planned and unplanned outages, and an uptime of 99.9967% where the system was operating in some form but at a reduced capacity. This was coupled with above average user satisfaction and importance, obtained via annual user surveys (1-5 scale) the "Importance of Jetstream for Research or Education" grew year over year, concluding with a rating >4, along with the quality/speed to support question response averaging >4.5. 

The services pioneered on Jetstream or by the project team have also had a significant impact on currently funded Category I resources as well as within discipline-specific projects that leverage the environment including: dynamically scaling virtual clusters used by the Galaxy project, providing non-command-line access to CI resources by default through single sign-on (including providing resources to spawn a new OpenStack interface, Exosphere), providing the first virtual GPUs, providing the first object storage system in the NSF CI ecosystem, and the first cloud native orchestration interface to a production NSF resource (e.g. OpenStack Magnum used with Kubernetes and Docker Swarm). 

Jetstream's impact on student, researcher, and educator success took multiple forms:

  • As was widely published in the mainstream media, a group of scientists produced the first image of a black hole using Event Horizon Telescope (EHT) observations of the center of the galaxy M87. As part of this work, Jetstream was used to develop cloud-based data analysis pipelines that were critical for combining and sharing data taken from the geographically-distributed observatories. It was noted that the cloud pipeline used for analysis could not have been developed without Jetstream.

  • Jetstream joined the COVID-19 HPC Consortium in March 2020 to contribute resources to the pandemic response. The team participated in joint reporting efforts to the NSF throughout the pandemic with numerous projects taking advantage of the resource. These COVID-19 related projects include gateways such as Galaxy (used for some of the earliest genomic analysis) and ChemCompute (used by students for remote learning), as well as for international medical records projects like OpenMRS, and regional information such as projections by University of Texas, San Antonio for Texas hospital bed and ICU bed usage. 

  • Jetstream hosted an Research Experience for Undergraduates program for five consecutive years from 2017 through 2021 for 4-6 students per year (25 students slots total) that resulted in tangible outcomes for the students in the form of project papers, and opportunities to present at local events to their peers, as well as national forums such as the PEARC conference series and even at the SC Conference Series, the International Conference for High Performance Computing, Networking, Storage, and Analysis. The program focused recruitment on individuals from MSIs, HSI, and EPSCoR jurisdictions. 

 

 


Last Modified: 03/23/2023
Modified by: David Y Hancock

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page