
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | August 10, 2016 |
Latest Amendment Date: | November 18, 2019 |
Award Number: | 1550593 |
Award Instrument: | Standard Grant |
Program Manager: |
Seung-Jong Park
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | August 31, 2020 (Estimated) |
Total Intended Award Amount: | $350,885.00 |
Total Awarded Amount to Date: | $350,885.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
110 INNER CAMPUS DR AUSTIN TX US 78712-1139 (512)471-6424 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
101 E. 27th Street, Suite 5.300 Austin TX US 78712-1532 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
APPLIED MATHEMATICS, COMPUTATIONAL MATHEMATICS, Software Institutes, CDS&E-MSS |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Scientists often use mathematical models to predict the behavior of natural and engineered systems. These models are therefore fundamental to scientific and engineering progress and hence relevant to NSF's science mission. Most models of realistic physical systems use complex formulae (such as, partial differential equations) involving many variables. When using such a model for predicting the future behavior of a system, a scientist has to provide initial values for all the variables. This can be difficult because input values may not be directly measureable. Thus, scientists often must use "inverse" computations to calculate the initial input values of the variables of a system model based on external observations of the real world. In other words, scientists seek to infer inputs to a computer model of a physical process from real observational data of the outputs. There are many examples of inverse computations, ranging from computing the important dimensions of an organ from its CAT scan, reconstructing the source of a sound by measuring its volume and frequency at various places, calculating the density of the Earth from measurements of its gravity field, or calculating the initial condition of the atmosphere (temperature, pressure, etc.) from satellite and weather station observations over a time interval. Inverse problems are ubiquitous across all of science and engineering (and beyond). Many solutions exist for inverse problems, i.e. solutions that fit the data to the observations. However, there are variations in the solutions identified. That is, the solutions of an inverse problem are subject to uncertainty. Bayesian inferencing provides a systematic mathematical framework for characterizing this uncertainty. However, the Bayesian solution of inverse problems for large-scale complex models require enormous computational power. Only recently have algorithms begun to emerge that are computationally tractable. However, these algorithms have remained out of the reach of the mainstream of scientists who solve inverse problems, due to their complexity and the need for deeper information from the forward model. This project aims to develop, distribute, and support open-source software that encodes state-of-the-art algorithms for the solution of large-scale complex Bayesian inverse problems and is robust, scalable, flexible, modular, widely accessible, and easy to use.
The project builds heavily on two complementary open-source software libraries the team has been developing: MUQ at MIT, and hIPPYlib at UT-Austin/UC-Merced. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients/Hessians to permit large-scale solution. hIPPYlib implements powerful large-scale gradient/Hessian-based inverse solvers in an environment that can automatically generate needed derivatives, but it lacks full Bayesian capabilities. By integrating these two complementary libraries, the project will result in a robust, scalable, and efficient software framework that realizes the benefits of each to tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. The resulting software, that will be distributed under an open-source license, will provide an environment for rapid development of inverse models equipped with gradient/Hessian information; benchmark problems for evaluation and comparison of algorithms; and tutorial problems for training and testing purposes.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The overarching goal of this collaborative project between UT Austin, MIT, and UC Merced was the development and dissemination of a high-performance, open-source software framework incorporating a suite of advanced algorithms for the solution of Bayesian inverse problems. Bayesian inversion is the most systematic and rigorous framework for learning physical models from data while accounting for uncertainties in both the data and the models. The components of a physical model we might wish to learn from the data include initial and boundary conditions, sources, material properties, geometry, and model structure, all of which can be heterogeneous in space and/or time. These heterogeneities imply that the parameters we wish to learn are in fact infinite-dimensional fields, which upon numerical discretization lead to very high-dimensional parameter spaces, as much as millions and more.
Inverse problems abound in every area of science, engineering, medicine, and technology. As just a few examples of model-based inverse problems, we may infer: coalescing binary system parameters from detected gravitational waves; earth structure from reflected seismic waves; reaction rates from measurements of chemically-reacting flows; ice sheet basal friction from satellite observations of surface flow; 3D bone structure from X-ray CT measurements; subsurface contaminant plume spread from crosswell electromagnetic measurements; internal structural defects from measurements of structural vibrations; initial conditions for prediction from meteorological data; ocean state from satellite and in-situ observations; and biochemical reaction networks from observed species concentrations.
Despite the ubiquitous nature of inverse problems, and despite the critical need to quantify uncertainties in the solution of inverse problems, application of classical Bayesian inversion methods to the class of inverse problems we target -- those characterized by complex models (e.g., governed by partial differential equations), complex data, and high-dimensional parameters -- is prohibitive. This is due to the fact that these methods suffer from the curse of dimensionality, resulting in a need to generate millions of samples or more, each requiring solution of the forward model.
While sophisticated methods have emerged over the past decade that exploit the mathematical structure of the inverse problem (including low-dimensionality, geometry, and smoothness) and thus mitigate the challenges outlined above, they have been buried in the mathematical and statistical literature and have remained out of reach of many scientists and engineers who wish to solve Bayesian inverse problems governed by complex forward models. Our project has brought these advances to the scientific and engineering communities in the form of the hIPPYlib and hIPPYlib/MUQ software frameworks. In particular, the major scientific and broader impact outcomes of our project include:
1. The further development of scalable and robust algorithms that exploit problem structure to make Bayesian inversion more tractable for a wider range of complex problems.
2. The incorporation of these algorithms into the hIPPYlib (https://hippylib.github.io/) and hIPPYlib/MUQ (https://hippylib.github.io/muq-hippylib/) open source software frameworks. hIPPYlib has been downloaded over 1000 times.
3. The application of hIPPYlib to a broad spectrum of challenging Bayesian inverse problems, including inference of atmospheric contaminant plumes, inference of faults from subsurface flow well data, inference of earthquake sources from GPS data, inference of phase field models of directed self assembly of block copolymers from image data, inference of ice sheet basal friction from InSAR-based surface velocities, inference of aquifer permeabilities from InSAR surface deformations, and inference of diffusivity and grow rate parameters for computational oncology models from MRI images.
4. The development of teaching materials and tutorials illustrating the capabilities of the hIPPYlib, in the form of Jupyter notebooks (interactive tutorials that mix instruction and theory with editable and runnable code).
5. Creation of dissemination and support mechanisms for hIPPYlib, including documentation, website, regular version releases, Docker images, issue tracking, and Slack channel.
6. The support and training of two postdoctoral researchers, one graduate student, and one undergraduate student in Bayesian inversion theory, algorithms, and software.
7. The use of hIPPYlib as the computational foundation for two graduate-level inverse problems courses at UT Austin and WUSTL. (hIPPYlib has been used in inverse problems courses at a number of other universities as well.)
8. The teaching of a two-week summer school in June 2018 (the SIAM Gene Golub Summer School) to 44 graduate students on the theme of inverse problems. hIPPYlib and MUQ were heavily employed to aid student comprehension and to support the student projects.
9. Over a dozen articles (published, submitted, in preparation) on hIPPYlib's algorithms and their applications to a spectrum of challenging inverse problems.
Last Modified: 02/22/2021
Modified by: Omar N Ghattas
Please report errors in award information by writing to: awardsearch@nsf.gov.