
NSF Org: |
AGS Division of Atmospheric and Geospace Sciences |
Recipient: |
|
Initial Amendment Date: | August 16, 2019 |
Latest Amendment Date: | November 14, 2022 |
Award Number: | 1928374 |
Award Instrument: | Standard Grant |
Program Manager: |
Maria Womack
mwomack@nsf.gov (703)292-2620 AGS Division of Atmospheric and Geospace Sciences GEO Directorate for Geosciences |
Start Date: | September 1, 2019 |
End Date: | August 31, 2023 (Estimated) |
Total Intended Award Amount: | $244,271.00 |
Total Awarded Amount to Date: | $244,271.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
3090 CENTER GREEN DR BOULDER CO US 80301-2252 (303)497-1000 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1850 Table Mesa Drive Boulder CO US 80305-5602 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | EarthCube |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.050 |
ABSTRACT
Earth science research is being reshaped by the availability of increasing amounts and variety of data, combined with ever more refined and computationally demanding models. This transition to a data-rich world offers immense opportunities for transformative scientific discoveries, but also presents new challenges to researchers: exploring these vast stores of data and combining them with complex models to make discoveries and novel predictions is technically challenging, requiring data management and computational expertise distinct from that of many Earth scientists. This project will develop novel tools to help Earth scientists seamlessly access and interact with extremely large data sets and powerful computational resources, in an environment that supports the lifecycle of research ideas from the scientist to the public. Specifically, this new effort builds upon the foundations of Project Jupyter, which provides tools for interactive computing, and partners with the Pangeo project that develops open tools and fosters a community of Big Data geoscientists. In this project, researchers will build new tools for interactive access to and exploration of data and models, driven by three specific problems in geoscience: the analysis of global climate models, the hydrology of watersheds, and the modeling of the subsurface of the Earth based on measurements of electric and magnetic fields. The project will advance technologies that empower multiple communities of researchers, both in Earth science and beyond. Tools from Project Jupyter are being used worldwide in research, education, industry, government, and media, including in the groundbreaking observation of gravitational waves by the Laser Interferometer Gravitational-Wave Observatory collaboration and the first direct image of a black hole made by the Event Horizon Telescope. The outcomes of the project will be freely available to the public as Open Source software.
The project will use geoscience use-cases in hydrology, climate science, and geophysics to drive the advancement of computational technologies for interactive geoscience research involving very large datasets and computationally complex models. These use-cases require High Performance Computing facilities or distributed computing in the cloud, and highlight the need for capabilities to: (1) handle big data such as the World Climate Research Program's Coupled Model Intercomparison Project's 6th release, expected to exceed 18 petabytes in size, (2) integrate data over variable spatial and temporal scales, including streamflow forecasts with sensor-based observations of discharge and hydrometeorological forcing factors, such as precipitation, temperature, relative humidity, and snow-water equivalent, (3) perform large-scale, parallelized computations that combine the solution of partial differential equations with numerical optimization to construct 3D models of the subsurface in a geophysical inversion of electromagnetic data. The project team is an interdisciplinary collaboration that brings together software developers, geoscientists, and statisticians to advance the state of data science in the geosciences. The researchers will follow a user-centered design approach that Project Jupyter has successfully applied for over 15 years, using concrete use-cases to constrain and prioritize software development and ensure that all resulting features have direct scientific relevance. The key software goals of the project are to: (a) improve access to data sources and data catalogs by exposing them to users in the same Jupyter interface where they conduct their computational work, (b) empower researchers to seamlessly utilize and combine cloud and high performance computing resources, (c) accelerate research by simplifying the process for scientists to create and deploy custom, interactive applications for their research questions, and (d) facilitate dissemination of research findings to decision-makers, stakeholders, and the general public. To achieve these, the project will advance three key Jupyter technologies: JupyterLab, Jupyter Widgets and JupyterHub. JupyterLab is an extensible interface that provides access to data, computation, and visualization. Jupyter Widgets provide easy-to-use tools for researchers to create rich graphical user interfaces for data analysis. JupyterHub is a tool for deploying computational web-based interfaces on shared infrastructure, such as the cloud or High Performance Computing centers. By working on three concrete geoscience problems the researchers will advance the state of the art in their respective fields, yet in their implementation within the open Jupyter ecosystem they will ensure that their solutions are generalizable to other scientific domains.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
In an era where data drives discovery, the "Jupyter Meets the Earth" project has led efforts to integrate cutting-edge computational tools with geoscience research. The project built on the strengths of the Jupyter and Pangeo ecosystems to tackle diverse research challenges spanning cryosphere science, climate science, hydrology, and geophysics.The UC Berkeley-NCAR collaboration has enhanced JupyterHub functionality, optimized Jupyter’s high-performance computing capabilities, and made improvement in Xarray and related projects, essential for cloud-native workflow in earth science. The project inspired the establishment of a new external non-profit organization co-founded by several team members, called 2i2c (the International Interactive Computing Collaboration), that deploys JupyterHub-based infrastructure for researchers and educators while contributing back to the core open source tools that power this work. 2i2c hosts multiple hubs for communities in the Earth science space (including Pangeo). This led to the recognition of Dr.Tasha Snow, CryoCloud community leader, who received the 2023 AGU Open Science Prize for the technical, scientific and community impact of this work. In summary, the multidisciplinary Juptyer meets the Earth initiative has advance computational geoscience, open-source software, and collaboration, promoting sustainable growth in technology, science and open science practices.
Contribution to Jupyter and Open-Source Ecosystems:
We have enhanced JupyterHub’s functionality and usability, integrating it with Kubernetes to support scalable, cloud-based research. This advancement has improved the collaborative scientific workflows efficiency. Our team’s contributions to open-source projects have strengthened the Jupyter ecosystem, benefiting a broad range of scientific domains reliant on interactive computing.
Scientific Discovery:
Our project has driven geoscientific discovery by advancing cloud-native, open-source workflows for data processing, analysis and sharing. The tools we’ve contributed to are key for understanding Earth's systems, evidenced by research publication and global study applications. Highlights incldue:
-
Advancement in Cryosphere Science:
-
Developed GLAFT, an open-source software for glacier velocity mapping from satellite images, improving feature-tracking accuracy.
-
Advanced geoscientific modeling with the ODINN.jl framework, applying Universal Differential Equations to better understand glacier ice flow for global-scale applications across entire database of 200,000 glaciers.
-
-
Contributions to Geophysics and Earth Sciences:
-
Improved ancient geomagnetic field estimations through optimized paleomagnetic sampling, focusing on independent site recordings.
-
Developed new statistical tools for analyzing Mars' magnetosphere with MAVEN, refining classical theories and adding insights into planetary space physics.
-
-
Innovations in Hydrology:
-
Enabled interactive acquisition and preprocessing of hydrometeorological data, applying data quality control and missing value techniques.
-
Benchmarked the National Hydrological Model, enhancing U.S. hydrological understanding.
-
-
Open-Source Software and Educational Impact:
-
Launched the EZ-FeatureTrack project, providing a Jupyter notebook-based interface for glacier feature tracking.
-
Initiated the GeoStacks project to streamline Earth science data management, with a reproducible Landsat 8 session.
-
Released glacier vulnerability materials adhering to FAIR policy, supporting open and reproducible research.
-
Developed software, tutorial and training materials for popular open-source Python projects — Xarray for data structures, Dask for parallel computing, and CuPy for GPU-enabled computing — in earth system science workflows. Materials are available online.
-
-
Advancement in Geophsyics Software:
-
Implemented 2D and 3D simulations with Dask for Magnetotelluric data, enhancing data processing capabilities.
-
Recorded SimPEG meetings, sharing them publicly for educational transparency and community learning.
Formation of 2i2c and Broadening Impact:
In response to infrastructure needs, team members and colleagues from Pangeo, UC Berkeley and UBC established 2i2c, a non-profit. This strategic development expanded our reach, offering cloud hubs to varied research and educational institutions, emphasizing on support for community colleges and HBCUs. The formation of 2i2c strengthened our infrastructure and played a key role in democratizing access to cutting-edge computational tools, enabling broader participation in open scientific research.
Conclusion:
"Jupyter Meets the Earth" has significantly advanced geoscience research and set new benchmarks in scientific education and collaborative innovation. Our commitment to developing interactive tools, fostering open-source methodologies, and enriching educational experiences has charted a course toward a future where scientific discovery is more collaborative, innovative, and accessible. We’ve quipped researchers, educators, and students to excel in their fields and forged new career paths like geospatial data scientists and open-source infrastructure engineers, at computation and geoscience emerging frontiers. We expect the project's legacy to inspire sustainable practices and partnerships to further these advancements for years to come.
Intellectual Merit:
The project harnessed the Jupyter and Pangeo ecosystems to innovate geoscientific research, improving data processing and analysis. JupyterHub’s enhancements and Kubernetes integration led to scalable research infrastructures, and tools like GLAFT and ODINN.jl for nuanced Earth system analyses. This elevated computational geoscience, enabling researchers to tackle complex datasets and uncover new insights with greater efficiency and precision.
Broader Impacts:
Founding 2i2c broadened computational tool access, enhancing research equity, especially for resource-limited institutions. Educational programs and workshops have empowered a diverse audience with advanced computational skills, fostering a robust culture of open science. This initiative has catalyzed community engagement, creating pathways for emerging professionals and enriching the scientific ecosystem with collaborative and innovative endeavors.
Last Modified: 02/01/2024
Modified by: Deepak A Cherian
Please report errors in award information by writing to: awardsearch@nsf.gov.