
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 30, 2012 |
Latest Amendment Date: | May 6, 2013 |
Award Number: | 1218524 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2012 |
End Date: | August 31, 2017 (Estimated) |
Total Intended Award Amount: | $495,961.00 |
Total Awarded Amount to Date: | $511,961.00 |
Funds Obligated to Date: |
FY 2013 = $16,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
101 COMMONWEALTH AVE AMHERST MA US 01003-9252 (413)545-0698 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
CompSci 140 Governors Drive Amherst MA US 01003-9264 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01001314DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The objective of this project is to design and develop a data management system that supports query processing on continuous uncertain data by returning a full probability distribution of query output and optimizes such processing for performance. This project includes four thrusts: (1) supporting continuous uncertain data processing using both the traditional relational model and the array model; (2) addressing complex correlation that arises in continuous uncertain data processing using new statistical graphical models; (3) supporting arbitrary user-defined functions, besides standard query operations, by exploring advanced techniques such as Gaussian processes and functional interpolation; and (4) developing a prototype system and evaluating it using real-world applications. Expected results include statistical models and techniques, data storage schemes, query processing and optimization techniques, and a publicly available prototype to fully support query processing on continuous uncertain data.
The results of the project can benefit applications such as severe weather monitoring and computational astrophysics, as well as the broader scientific community. Since applications such as tornado detection may trigger actions based on derived information, the ability to characterize uncertainty of output may result in significant social impacts. This project also integrates research and education with curriculum development and engaging women in research through college outreach and CRA's distributed mentor program. The results of the project are disseminated at the project web site: http://claro.cs.umass.edu.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The overall research goal of our proposal was to design, develop, and evaluate a data management system that provides fundamental support for query processing on large sensor and scientific datasets, which often involve much uncertainty in the content of data as well as in the query evaluation process. Our work distinguishes from prior work in three key aspects: (1) It supports both relational algebra and array algebra for query processing, where the uncertainty in query processing arises from the uncertainty of data content. Our support of both algebras entails broader applicability in scientific domains. (2) Our work provides efficient algorithms not only for algebraic operators, but also for user-defined functions that are prevalent in real-world applications and hard to support. (3) We further broadened our project to support uncertainty in the user data interest itself using interactive data exploration, which combines machine learning techniques and database optimizations.
Results of this project significantly advanced the state of the art with the following contributions: (1) Our work supports uncertain data management in both relational and array databases. For array databases, our project is the first to provide the formal semantics of array operations on uncertain data. We also provide efficient algorithms for these array operations, which can outperform existing methods by up to 1-2 orders of magnitude in efficiency. (2) Besides algebraic operators, our work also supports user-de?ned functions (UDFs) on uncertain data. Our approach based on Gaussian processes (GPs) characterizes the UDF output using probability distributions and error bounds, which is the ?rst result to quantify output distributions of Gaussian processes with error bounds. In addition, our optimization techniques allow our GP techniques to offer up to two orders of magnitude speedup over MC sampling. (3) To support uncertainty in the user data interest, our interactive data exploration techniques outperform traditional active learning and random sampling in both accuracy and interactive performance. Our user study results further reveal that compared to the manual exploration approach, our system can reduce the user labeling effort by up 87%, with an average of 66% reduction.
For the broader scientific community, our proposed techniques for uncertain data processing have the potential to add fundamental support for reasoning the result quality when such results are computed from uncertain data. Our work on supporting query uncertainty through interactive data exploration will significantly increase the utility of the database when users come to explore large scientific databases with complex structure and content as well as imprecise goals. As such, our project will increase both the quality of analytical results computed from uncertain data, and the utility of the database when the user data interest cannot be precisely stated upfront – both benefits will be of significant importance to the scientific community for data-driven discovery. Besides research activities, this project also involved a number of educational efforts, including an integrated undergraduate and graduate curriculum on data analytics and statistical analysis, and outreach and mentoring activities to engage women in research.
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Last Modified: 12/15/2017
Modified by: Yanlei Diao
Please report errors in award information by writing to: awardsearch@nsf.gov.