
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 26, 2015 |
Latest Amendment Date: | November 13, 2019 |
Award Number: | 1527504 |
Award Instrument: | Standard Grant |
Program Manager: |
Hector Munoz-Avila
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2015 |
End Date: | August 31, 2020 (Estimated) |
Total Intended Award Amount: | $500,000.00 |
Total Awarded Amount to Date: | $500,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
5717 CORBETT HALL ORONO ME US 04469-5717 (207)581-1484 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
5711 Boardman Hall Orono ME US 04469-5717 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Massive sensor data streams are created from the automatic collection of sensor data in high frequency and in near real-time today. This project aims to advance the analytical potential of live-streamed data, historical data streams, and model simulations by creating an overarching representation in the form of the field data model with a set of operators that establish the field algebra. A field is best explained as, for example, a magnetic field; the magnetic force can be determined for each point in a magnetic field and the field is therefore considered to be continuous. Similarly, environmental phenomena such as air pollution or flooding are considered continuous in space and time although they are sampled at limited, discrete time-space locations with sensors. This project develops the field algebra, which is an intuitive, yet mathematically defined formalism to represent real-world phenomena as fields and to express analytical needs as canonical operations over fields. The field model represents phenomena as continuous entities again, and the implementation hides the fact that their spatio-temporal continuity is calculated on-the-fly based on real-time measurements streams. Extending sensor data streams to fields is transformative, as rarely a domain scientist is interested in the readings of individual sensors. Allowing scientists to work with high-level abstractions will significantly enhance their analytical tasks such as finding insights about changes, trends, or unexpected events happening in the real world. The project will integrate fields and data streams mathematically so that mappings between both are well defined. The field data model is complemented by the development of an innovative computational framework for synthesizing and analyzing fields based on very large numbers of high throughput, real-time sensor data streams, and for creating continuous representations on-the-fly. This framework provides novel algorithms to assure that the field operators can absorb the throughput of very large numbers of sensor data streams, yet still compute complex analytical results in near real-time. This project will benefit our society by enabling us to react to situations such as extreme weather events, environmental disasters or chemical accidents immediately, and organize response effort based on accurate and timely information; this will help to protect the public interests better.
The research in this project develops a formal foundation for sensor data streams by abstracting them as geographic fields, and a scalable computational framework that computes field operators on massive numbers of sensor data streams in near real-time. In this research, the field algebra, with a recursive definition of fields and a set of field operators are formalized. The field algebra and data streams are formally integrated on the level of their mathematical foundations. The formal field algebra is implemented as a data type hierarchy and integrated with stream data models. At the same time, a computational framework is developed that extends data stream engines with computational components to estimate spatio-temporal fields based on recursive or transposed field definitions, and the evaluation of complex predicates over fields, which lays the foundation for co-analyzing live and historic fields. The results of this project will be distributed via scientific publications, open source software, and online training tutorials and classes. The project web site (https://silvianittel.wordpress.com/from-streams-to-fields-nsf/) will provide access to the results of this project.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Today, many small sensors deployed in physical space generate spatially and temporally dense data at unprecedented rates and provide unique opportunities for the analysis of real-world phenomena in real-time. However, data analysis faces the challenge of analyzing streams across different sampling rates and integrating them with historic time series at different resolutions. We believe users of information systems should interact with abstractions of the real-world phenomena, rather than many individual measurements captured by the data streams.
This work has advanced the analytical potential of live-streamed data, historical data streams, and model simulations by creating an overarching representation in the form of a field data model with a parametric type definition and a set of operators that establish the field algebra. A Spatio-temporal field is mathematically defined and considered continuous over geographic space and time. Thus, a value can be determined for each point in the field. Sensors sample air pollution or radiation phenomena for applications at limited, discrete time-space locations. However, the innovative implementation of fields data types hides the fact that their Spatio-temporal continuity is calculated on the fly based on real-time measurement streams.
The results of this project are manifold: The field algebra has been developed and formalized on several levels, including a formal integration with stream models. Users can express analytical needs as canonical operations over fields.
- With a recursive definition of fields and a basic set of field operators, the field algebra has been formalized.
- The field algebra and data streams have been formally integrated concerning their mathematical foundation.
- The formal field algebra is implemented as a data type hierarchy and integrated with the relational data model and stream data models (CQL).
- Several open source implementation of field types over streams have been made available, including a field data type hierarchy in Python for Apache Spark.
- The data model has been complemented by the implementation of a computational framework for synthesizing and analyzing fields based on vast numbers of high throughput, real-time data streams. The experimental evaluation of the computational framework shows that up to 250K sample points in a stream query window can be spatiotemporally interpolated using a novel, stream-based Inverse Distance Weightings algorithm into a raster representation in less than 2 s using a laptop.
The grant has fully or partially supported four doctoral students. Two Ph.D. students graduated during the grant duration. Additionally, three MS students were advised and contributed to the open-source libraries of this project. Dr. Nittel mentored several female Computer Science undergraduate students, two of whom graduated as Computer Science and College of Liberal Arts and Science respectively outstanding student of the year. The results of this project were distributed via eight scientific publications and presentations in journals and conferences. Additionally, Dr. Nittel developed a novel online graduate course, "Real-time Sensor Data Streams." Several open-source software libraries have been made available, including a temporal Field library in Java for Postgres, a Java-based QGIS plugin, an Apache Spark Field library, and a Python QGIS stream plugin.
Last Modified: 11/16/2021
Modified by: Silvia Nittel
Please report errors in award information by writing to: awardsearch@nsf.gov.