Award Abstract # 1928406
Collaborative Research: EarthCube Data Capabilities--Jupyter Meets the Earth: Enabling Discovery in Geoscience through Interactive Computing at Scale

NSF Org: AGS
Division of Atmospheric and Geospace Sciences
Recipient: REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE
Initial Amendment Date: August 16, 2019
Latest Amendment Date: August 16, 2019
Award Number: 1928406
Award Instrument: Standard Grant
Program Manager: Maria Womack
AGS
 Division of Atmospheric and Geospace Sciences
GEO
 Directorate for Geosciences
Start Date: September 1, 2019
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $1,712,604.00
Total Awarded Amount to Date: $1,712,604.00
Funds Obligated to Date: FY 2019 = $1,712,604.00
History of Investigator:
  • Fernando Perez (Principal Investigator)
    fernando.perez@berkeley.edu
  • Laurel Larsen (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-Berkeley
1608 4TH ST STE 201
BERKELEY
CA  US  94710-1749
(510)643-3891
Sponsor Congressional District: 12
Primary Place of Performance: University of California-Berkeley
367 Evans Hall
Berkeley
CA  US  94720-3860
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): GS3YEVSS12N6
Parent UEI:
NSF Program(s): EarthCube
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01001920RB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 4200, 4444, 7433
Program Element Code(s): 807400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.050

ABSTRACT

Earth science research is being reshaped by the availability of increasing amounts and variety of data, combined with ever more refined and computationally demanding models. This transition to a data-rich world offers immense opportunities for transformative scientific discoveries, but also presents new challenges to researchers: exploring these vast stores of data and combining them with complex models to make discoveries and novel predictions is technically challenging, requiring data management and computational expertise distinct from that of many Earth scientists. This project will develop novel tools to help Earth scientists seamlessly access and interact with extremely large data sets and powerful computational resources, in an environment that supports the lifecycle of research ideas from the scientist to the public. Specifically, this new effort builds upon the foundations of Project Jupyter, which provides tools for interactive computing, and partners with the Pangeo project that develops open tools and fosters a community of Big Data geoscientists. In this project, researchers will build new tools for interactive access to and exploration of data and models, driven by three specific problems in geoscience: the analysis of global climate models, the hydrology of watersheds, and the modeling of the subsurface of the Earth based on measurements of electric and magnetic fields. The project will advance technologies that empower multiple communities of researchers, both in Earth science and beyond. Tools from Project Jupyter are being used worldwide in research, education, industry, government, and media, including in the groundbreaking observation of gravitational waves by the Laser Interferometer Gravitational-Wave Observatory collaboration and the first direct image of a black hole made by the Event Horizon Telescope. The outcomes of the project will be freely available to the public as Open Source software.

The project will use geoscience use-cases in hydrology, climate science, and geophysics to drive the advancement of computational technologies for interactive geoscience research involving very large datasets and computationally complex models. These use-cases require High Performance Computing facilities or distributed computing in the cloud, and highlight the need for capabilities to: (1) handle big data such as the World Climate Research Program's Coupled Model Intercomparison Project's 6th release, expected to exceed 18 petabytes in size, (2) integrate data over variable spatial and temporal scales, including streamflow forecasts with sensor-based observations of discharge and hydrometeorological forcing factors, such as precipitation, temperature, relative humidity, and snow-water equivalent, (3) perform large-scale, parallelized computations that combine the solution of partial differential equations with numerical optimization to construct 3D models of the subsurface in a geophysical inversion of electromagnetic data. The project team is an interdisciplinary collaboration that brings together software developers, geoscientists, and statisticians to advance the state of data science in the geosciences. The researchers will follow a user-centered design approach that Project Jupyter has successfully applied for over 15 years, using concrete use-cases to constrain and prioritize software development and ensure that all resulting features have direct scientific relevance. The key software goals of the project are to: (a) improve access to data sources and data catalogs by exposing them to users in the same Jupyter interface where they conduct their computational work, (b) empower researchers to seamlessly utilize and combine cloud and high performance computing resources, (c) accelerate research by simplifying the process for scientists to create and deploy custom, interactive applications for their research questions, and (d) facilitate dissemination of research findings to decision-makers, stakeholders, and the general public. To achieve these, the project will advance three key Jupyter technologies: JupyterLab, Jupyter Widgets and JupyterHub. JupyterLab is an extensible interface that provides access to data, computation, and visualization. Jupyter Widgets provide easy-to-use tools for researchers to create rich graphical user interfaces for data analysis. JupyterHub is a tool for deploying computational web-based interfaces on shared infrastructure, such as the cloud or High Performance Computing centers. By working on three concrete geoscience problems the researchers will advance the state of the art in their respective fields, yet in their implementation within the open Jupyter ecosystem they will ensure that their solutions are generalizable to other scientific domains.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 15)
Abernathey, Ryan P. and Augspurger, Tom and Banihirwe, Anderson and Blackmon-Luca, Charles C. and Crone, Timothy J. and Gentemann, Chelle L. and Hamman, Joseph J. and Henderson, Naomi and Lepore, Chiara and McCaie, Theo A. and Robinson, Niall H. and Signe "Cloud-Native Repositories for Big Scientific Data" Computing in Science & Engineering , v.23 , 2021 https://doi.org/10.1109/MCSE.2021.3059437 Citation Details
Azari, A. R. and Abrahams, E. and Sapienza, F. and Mitchell, D. L. and Biersteker, J. and Xu, S. and Bowers, C. and Pérez, F. and DiBraccio, G. A. and Dong, Y. and Curry, S. "Magnetic Field Draping in Induced Magnetospheres: Evidence From the MAVEN Mission to Mars" Journal of Geophysical Research: Space Physics , v.128 , 2023 https://doi.org/10.1029/2023JA031546 Citation Details
Bolibar, Jordi and Sapienza, Facundo and Maussion, Fabien and Lguensat, Redouane and Wouters, Bert and Pérez, Fernando "Universal differential equations for glacier ice flow modelling" Geoscientific Model Development , v.16 , 2023 https://doi.org/10.5194/gmd-16-6671-2023 Citation Details
Cholia, Shreyas and Heagy, Lindsey and Henderson, Matthew and Paine, Drew and Hays, Jon and Bianchi, Ludovico and Ghoshal, Devarshi and Perez, Fernando and Ramakrishnan, Lavanya "Towards Interactive, Reproducible Analytics at Scale on HPC Systems" 2020 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC) , 2020 https://doi.org/10.1109/UrgentHPC51945.2020.00011 Citation Details
Granger, Brian E. and Perez, Fernando "Jupyter: Thinking and Storytelling With Code and Data" Computing in Science & Engineering , v.23 , 2021 https://doi.org/10.1109/MCSE.2021.3059263 Citation Details
Moges, Edom and Ruddell, Benjamin L. and Zhang, Liang and Driscoll, Jessica M. and Larsen, Laurel G. "Strength and Memory of Precipitation's Control Over Streamflow Across the Conterminous United States" Water Resources Research , v.58 , 2022 https://doi.org/10.1029/2021WR030186 Citation Details
Moges, Edom and Ruddell, Benjamin L. and Zhang, Liang and Driscoll, Jessica M. and Norton, Parker and Perez, Fernando and Larsen, Laurel G. "HydroBench: Jupyter supported reproducible hydrological model benchmarking and diagnostic tool" Frontiers in Earth Science , v.10 , 2022 https://doi.org/10.3389/feart.2022.884766 Citation Details
Sapienza, F. and Gallo, L. C. and Zhang, Y. and Vaes, B. and Domeier, M. and SwansonHysell, N. L. "Quantitative Analysis of Paleomagnetic Sampling Strategies" Journal of Geophysical Research: Solid Earth , v.128 , 2023 https://doi.org/10.1029/2023JB027211 Citation Details
Stern, Charles and Abernathey, Ryan and Hamman, Joseph and Wegener, Rachel and Lepore, Chiara and Harkins, Sean and Merose, Alexander "Pangeo Forge: Crowdsourcing Analysis-Ready, Cloud Optimized Data Production" Frontiers in Climate , v.3 , 2022 https://doi.org/10.3389/fclim.2021.782909 Citation Details
Werthmüller, Dieter and Rochlitz, Raphael and Castillo-Reyes, Octavio and Heagy, Lindsey "Towards an open-source landscape for 3-D CSEM modelling" Geophysical Journal International , v.227 , 2021 https://doi.org/10.1093/gji/ggab238 Citation Details
Zhang, Liang and Moges, Edom and Kirchner, James W. and Coda, Elizabeth and Liu, Tianchi and Wymore, Adam S. and Xu, Zexuan and Larsen, Laurel G. "CHOSEN : A synthesis of hydrometeorological data from intensively monitored catchments and comparative analysis of hydrologic extremes" Hydrological Processes , v.35 , 2021 https://doi.org/10.1002/hyp.14429 Citation Details
(Showing: 1 - 10 of 15)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In an era where data drives discovery, the "Jupyter Meets the Earth" project has led efforts to integrate cutting-edge computational tools with geoscience research. The project built on the strengths of the Jupyter and Pangeo ecosystems to tackle diverse research challenges spanning cryosphere science, climate science, hydrology, and geophysics. The UC Berkeley-NCAR collaboration has enhanced JupyterHub functionality, optimized Jupyter's high-performance computing capabilities, and made improvements in Xarray and related projects, essential for cloud-native workflows in earth science. The project inspired the establishment of a new external non-profit organization co-founded by several team members, called 2i2c (the International Interactive Computing Collaboration), that deploys JupyterHub-based infrastructure for researchers and educators while contributing back to the core open source tools that power this work. 2i2c hosts multiple hubs for communities in the Earth science space (including Pangeo). The AGU, recognizing the technical, scientific, and community impact of this work, awarded Dr. Tasha Snow, CryoCloud community leader, the 2023 AGU Open Science Prize. In summary, the multidisciplinary Jupyter meets the Earth initiative has advanced computational geoscience, open-source software, and collaboration, promoting sustainable growth in technology, science and the practice of open science.

Contribution to Jupyter and Open-Source Ecosystems: We have enhanced JupyterHub's functionality and usability, integrating it with Kubernetes to support scalable, cloud-based research. This advancement has improved the efficiency of collaborative scientific workflows. Our team's contributions to open-source projects have strengthened the Jupyter ecosystem, benefiting a broad range of scientific domains reliant on interactive computing.

Scientific Discovery: Our project has driven geoscientific discovery by advancing cloud-native, open-source workflows for data processing, analysis and sharing. The tools we've contributed to are key for understanding Earth's systems, evidenced by our research publication and applications across various global studies. Highlights include:

  • Advancement in Cryosphere Science:

    • Developed GLAFT, an open-source software for glacier velocity mapping from satellite images, improving feature-tracking accuracy.
    • Advanced geoscientific modeling with the ODINN.jl framework, applying Universal Differential Equations to better understand glacier ice flow for global-scale applications across the entire database of 200,000 glaciers.
  • Contributions to Geophysics and Earth Sciences:

    • Improved ancient geomagnetic field estimations through optimized paleomagnetic sampling, focusing on independent site recordings.
    • Developed new statistical tools for analyzing Mars' magnetosphere with MAVEN, refining classical theories and adding insights into planetary space physics.
  • Innovations in Hydrology:

    • Enabled interactive acquisition and preprocessing of hydrometeorological data, applying data quality control and missing value techniques.
    • Benchmarked the National Hydrological Model, enhancing U.S. hydrological understanding.
  • Open-Source Software and Educational Impact:

    • Launched the EZ-FeatureTrack project, providing a Jupyter notebook-based interface for glacier feature tracking.
    • Initiated the GeoStacks project to streamline Earth science data management, with a reproducible Landsat 8 session.
    • Released glacier vulnerability materials adhering to FAIR policy, supporting open and reproducible research.
    • Developed software, tutorial and training materials for popular open-source Python projects -- Xarray for data structures, Dask for parallel computing, and CuPy for GPU-enabled computing -- in earth system science workflows. Materials are available online.
  • Advancement in Geophysics Software:
    • Implemented 2D and 3D simulations with Dask for Magnetotelluric data, enhancing data processing capabilities.
    • Recorded SimPEG meetings, sharing them publicly for educational transparency and community learning.

Formation of 2i2c and Broadening Impact: In response to infrastructure needs, team members and colleagues from Pangeo, UC Berkeley and UBC established 2i2c, a non-profit. This strategic development expanded our reach, offering cloud hubs to varied research and educational institutions, emphasizing support for community colleges and HBCUs. The formation of 2i2c strengthened our infrastructure and played a key role in democratizing access to cutting-edge computational tools, enabling broader participation in open scientific research.

Conclusion: "Jupyter Meets the Earth" has significantly advanced geoscience research and set new benchmarks in scientific education and collaborative innovation. Our commitment to developing interactive tools, fostering open-source methodologies, and enriching educational experiences has charted a course toward a future where scientific discovery is more collaborative, innovative, and accessible. We've equipped researchers, educators, and students to excel in their fields and forged new career paths like geospatial data scientists and open-source infrastructure engineers, at the emerging frontiers of computation and geoscience. We expect the project's legacy to inspire sustainable practices and partnerships to further these advancements for years to come.

Intellectual Merit: The project harnessed the Jupyter and Pangeo ecosystems to innovate geoscientific research, improving data processing and analysis. JupyterHub's enhancements and Kubernetes integration led to scalable research infrastructures, and tools like GLAFT and ODINN.jl for nuanced Earth system analyses. This elevated computational geoscience, enabling researchers to tackle complex datasets and uncover new insights with greater efficiency and precision.

Broader Impacts: Founding 2i2c broadened computational tool access, enhancing research equity, especially for resource-limited institutions. Educational programs and workshops have empowered a diverse audience with advanced computational skills, fostering a robust culture of open science. This initiative has catalyzed community engagement, creating pathways for emerging professionals and enriching the scientific ecosystem with collaborative and innovative endeavors.


 

 


Last Modified: 02/01/2024
Modified by: Fernando Perez

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page