
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | March 8, 2017 |
Latest Amendment Date: | January 28, 2019 |
Award Number: | 1651565 |
Award Instrument: | Continuing Grant |
Program Manager: |
Kenneth Whang
kwhang@nsf.gov (703)292-5149 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | March 15, 2017 |
End Date: | February 29, 2024 (Estimated) |
Total Intended Award Amount: | $540,000.00 |
Total Awarded Amount to Date: | $540,000.00 |
Funds Obligated to Date: |
FY 2018 = $104,946.00 FY 2019 = $332,966.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
450 JANE STANFORD WAY STANFORD CA US 94305-2004 (650)723-2300 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
353 Serra Mall, Room 228 Stanford CA US 94305-5008 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Robust Intelligence |
Primary Program Source: |
01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Key sustainability challenges, such as poverty mitigation, climate change, and food security, involve global phenomena that are unique in scale and complexity. Our global sensing capabilities - from remote sensing to crowdsourcing - are becoming increasingly economical and accurate. These recent technological developments are creating new spatio-temporal data streams that contain a wealth of information relevant to sustainable development goals. Actionable insights, however, cannot be easily extracted because the sheer size and unstructured nature of the data preclude traditional analysis techniques. This five-year career-development plan is an integrated research, education, and outreach program focused on developing new AI techniques to extract actionable insights from large-scale spatio-temporal data. These techniques have the potential to yield accurate, inexpensive, and highly scalable models to inform research and policy.
The research goal of this project is to develop new modeling and algorithmic frameworks to help address global sustainability challenges involving spatio-temporal data. This research will develop new predictive models of complex spatio-temporal phenomena integrating in unique ways ideas from graphical models and representation learning, improving their overall performance. New approaches to learn from unlabeled data exploiting various forms of prior domain knowledge, including spatio-temporal dependencies and relationships between different data modalities, will be developed. To learn models and make predictions at scale, this project will also develop new scalable probabilistic inference methods based on the use of random projections to reduce the dimensionality of probabilistic models while preserving their key properties. The techniques developed will be made available to both academia and industry through open-source software, and will enable computationally feasible approaches for analyzing large spatio-temporal datasets and for modeling global scale phenomena. Predictions and data products produced by this project will enable new analyses and advance sustainability disciplines. Results will be disseminated widely through scientific articles, research seminars, and conference presentations to maximize the benefits to the scientific community. Educational and outreach efforts will include the involvement of undergraduate students undertaking independent research projects, a website describing research bridging computation and, and a summer outreach program aimed at introducing under-represented high-school students to computer science and artificial intelligence.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Our worldwide sensing capabilities, enabled by technologies ranging from remote sensing to crowdsourcing, are becoming more cost-effective and precise. These technological advances have led to the creation of new spatio-temporal data streams (such as frequent, high-resolution satellite images) that are rich in information pertinent to sustainable development objectives. However, deriving actionable insights is challenging due to the vast size and unstructured format of the data, which hinder conventional analysis methods. During the course of this project we developed a variety of machine learning approaches to automatically analyze spatio-temporal data streams (such as satellite images collected around the world), deriving important insights into key sustainability challenges such as poverty and climate.
As a first thrust, we developed a set of techniques to reduce the amount of training data needed to build machine learning models that use remote sensing data as an input. These techniques include both unsupervised, semi-supervised, and self-supervised learning models that can take advantage of large amounts of unlabeled data. For example, we developed GeoSSL and SatMAE, the first foundation models specifically developed for remote sensing data. Both these models are trained in a self-supervised way – without requiring any human feedback – to identify structure and features common in satellite images. These models can then be finetuned on a variety of downstream tasks (e.g., identifying objects in satellite images) using a small amount of training data, often achieving state-of-the-art results on relevant benchmark tasks.
As a second thrust, we developed a variety of techniques to enhance the scalability and reliability of probabilistic machine learning models. These include (1) approximate inference techniques that can scale to large numbers of variables in the models, (2) generative models where inference is tractable by design, (3) adaptive data acquisition approaches to reduce costs, and (4) uncertainty quantification techniques to assess confidence in probabilistic predictions obtained with machine learning models.
We have applied these techniques and developed models that can predict a variety of important socio-economic indicators at high spatial and temporal resolution across large geographies. For example, we built (1) the first deep learning models that are able to estimate poverty directly from satellite images, achieving accuracies comparable to traditional survey-based measures (2) deep learning models capable of predicting crop yields directly from space, and applied them both in the United States and internationally, (3) models that can track the quality of infrastructure across the world, (4) models that can predict key population health indicators, (5) models that can identify brick kilns from space, tracking compliance with environmental regulation.
On the education side, undergraduate and graduate students were involved in all aspects of the research activities described above. Throughout the process, they received training in research and computational thinking. A new class focused on machine learning for sustainability applications was also developed at Stanford.
Last Modified: 04/11/2024
Modified by: Stefano Ermon
Please report errors in award information by writing to: awardsearch@nsf.gov.