NSF Award Search: Award # 1822221

Award Abstract # 1822221

Advancing Sub-Seasonal Weather Predictability Through Machine Learning Techniques

NSF Org:	AGS Division of Atmospheric and Geospace Sciences
Recipient:	GEORGE MASON UNIVERSITY
Initial Amendment Date:	June 4, 2018
Latest Amendment Date:	May 11, 2023
Award Number:	1822221
Award Instrument:	Standard Grant
Program Manager:	Eric DeWeaver edeweave@nsf.gov (703)292-8527 AGS Division of Atmospheric and Geospace Sciences GEO Directorate for Geosciences
Start Date:	September 1, 2018
End Date:	August 31, 2024 (Estimated)
Total Intended Award Amount:	$459,667.00
Total Awarded Amount to Date:	$459,667.00
Funds Obligated to Date:	FY 2018 = $459,667.00
History of Investigator:	Timothy Delsole (Principal Investigator) tdelsole@gmu.edu
Recipient Sponsored Research Office:	George Mason University 4400 UNIVERSITY DR FAIRFAX VA US 22030-4422 (703)993-2295
Sponsor Congressional District:	11
Primary Place of Performance:	George Mason University 4400 University Drive Fairfax VA US 22030-4422
Primary Place of Performance Congressional District:	11
Unique Entity Identifier (UEI):	EADLFP7Z72E5
Parent UEI:	H4NRWLFCDF43
NSF Program(s):	Climate & Large-Scale Dynamics
Primary Program Source:	01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s):	574000
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.050

ABSTRACT

Current operational weather forecasting systems can produce valuable predictions of day-to-day weather, but such predictions are only skillful up to a week or so in advance. On the longer subseasonal timescale, say between two and eight weeks, useful forecasts of mean conditions and the likelihood of extreme events may also be possible. But basic science questions, including the sources and mechanisms of subseasonal variability, their potential predictability, and the essential elements necessary for robust prediction, have not yet been resolved. These questions are of practical as well as scientific interest, as guidance from subseasonal forecasts could have a variety of uses including agricultural planning and emergency management.

This project seeks to identify empirical predictive relationships between local climate variability of interest and large-scale patterns in relevant predictors such as sea surface temperature (SST), soil moisture, and atmospheric circulation. Prior work by the Principal Investigator (PI) and colleagues demonstrated such a relationship between heat waves in Texas, including the heat wave associated with the 2011 drought, and an SST pattern covering much of the North Pacific. A key limitation to such prediction methods is that the observed record is too short to identify statistically significant predictive relationships. Thus methods which seem successful when tested on past cases may fail when used for realtime prediction. The PI's strategy for circumventing this limitation is to identify predictive relationships using output from weather and climate models in place of observations, as many thousands of years of simulated weather and climate variability are available from a variety of modeling projects. Model output used here comes from the North American Multimodel Ensemble (NMME), the Subseasonal to Seasonal (S2S) Prediction Project, and the Coupled Model Intercomparison Project (CMIP).

A further concern in developing empirical prediction methods is the need for regularization, meaning a way to eliminate spurious small-scale features in predictor patterns which arise due to the large number of data points used to represent the predictor fields. Such features are not usually consistent from model to model or between model output and observations. The PI uses a regularization scheme in which eigenvectors of the Laplacian operator serve to factor out small scales, leading to more robust predictive relationships. To further ensure robust predictions, the PI applies an innovative cross validation technique which bypasses the best statistical model identified in cross validation in favor of the simplest model which is within a standard deviation of the best model.

Once robust predictive relationships are identified that hold across different models and in observations, the sources and mechanisms responsible for the relationships will be explored. If the empirical methods can reproduce hindcasts from the models in the NMME archive then the model output can be used to understand the time-evolving dynamical processes linking the predictor pattern to the predicted variability.

In addition to the societal benefits of subseasonal forecasts, the project provides support and training to a postdoc, thereby promoting workforce development in this research area. The project also addresses public scientific literacy through public seminars by the PI on sub-seasonal prediction, a topic of interest to the general public.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Buchmann, Paul and DelSole, Timothy "Week 34 Prediction of Wintertime CONUS Temperature Using Machine Learning Techniques" Frontiers in Climate , v.3 , 2021 https://doi.org/10.3389/fclim.2021.697423 Citation Details

DelSole, Timothy and Tippett, Michael K. "A mutual information criterion with applications to canonical correlation analysis and graphical models." Stat , v.10 , 2021 https://doi.org/10.1002/sta4.385 Citation Details

DelSole, Timothy and Tippett, Michael K. "Correcting the corrected AIC" Statistics & Probability Letters , v.173 , 2021 https://doi.org/10.1016/j.spl.2021.109064 Citation Details

Trenary, Laurie and DelSole, Timothy "Advancing interpretability of machine-learning prediction models" Environmental Data Science , v.1 , 2022 https://doi.org/10.1017/eds.2022.13 Citation Details

Trenary, Laurie and DelSole, Timothy "Skillful statistical prediction of subseasonal temperature by training on dynamical model data" Environmental Data Science , v.2 , 2023 https://doi.org/10.1017/eds.2023.2 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project developed new statistical tools to enhance predictions in the 2-8 week range, technically known as the 'subseasonal' time frame. While reliable weather forecasts for the next few days are common place, accurate longer-term predictions for the subseasonal range remain a major scientific challenge. Despite these challenges, the value of such forecasts is widely recognized-- accurate forecasts in this range would help communities prepare for heat waves, farmers plan their planting schedules, and utility companies manage energy resources more efficiently. Organizations like the U.S. Bureau of Reclamation and the United Nations' World Meteorological Organization have even hosted competitions to drive innovation in this field.

A major outcome of this project was the development of a method to uncover new sources of subseasonal predictability. The core idea is that if weather events are predictable on subseasonal time scales, there must be correlations between weather events at different times within a season. By identifying the weather patterns that exhibit the strongest correlations over subseasonal time scales, potential new sources of predictability may be discovered in a way that is independent of any preconceived assumptions about their sources. Motivated by this reasoning, this project developed a systematic method to identify the most correlated weather patterns over subseasonal time scales and rigorously test their statistical significance. Applying this method to temperatures over the United States revealed several patterns that are predictable at both weeks 1-2 and weeks 3-4. Notably, many of these patterns are unrelated to established indices of subseasonal predictability, suggesting the existence of new sources of predictability. These findings offer new insights into the mechanisms driving subseasonal variability and could significantly advance our understanding of predictability on subseasonal time scales. This work also formed the foundation for a successful PhD thesis.

A key challenge in developing statistical prediction models is determining which variables to use as inputs. This project introduced a new selection criterion called the Mutual Information Criterion (MIC), which not only identifies the best inputs for a model but also highlights the most promising variables to predict. This criterion is derived from fundamental principles and can be applied across a wide range of statistical models that extend beyond climate science. For example, MIC can assist in graphical model selection, making it potentially valuable for studies involving causality.

We also explored the potential of machine learning (ML) for making subseasonal predictions. A major challenge is that the observational record is relatively short, and hence ML models trained on observations typically make poor forecasts. To address this limitation, we trained ML models on extensive climate simulations spanning over 2,000 years. The resulting models produced significantly more accurate predictions compared to those trained on observations alone. This finding suggests that training ML models on a combination of observational data and model simulations may yield predictions that are more accurate than those based on either data source alone.

When a ML model makes an impressive prediction, a question arises: has it discovered a new scientific insight that could advance our broader understanding? Answering this is challenging because ML models often involve millions of interconnected variables, making it difficult to disentangle the relationships that drive their predictions. To tackle this challenge, we developed a novel "explainable AI" methodology that identifies the patterns that a model predicts most accurately. By focusing on a few highly predictable patterns, investigators can ignore the hundreds-to-thousands of patterns that are unpredictable, drastically simplifying the analysis. This approach provides a new tool for data scientists across disciplines to transform accurate predictions into scientific understanding.

Last Modified: 12/19/2024
Modified by: Timothy Delsole

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error