Award Abstract # 1956149
III: Medium: Collaborative Research: U4U - Taming Uncertainty with Uncertainty-Annotated Databases

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: THE RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF NEW YORK
Initial Amendment Date: September 9, 2020
Latest Amendment Date: October 15, 2020
Award Number: 1956149
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2020
End Date: September 30, 2025 (Estimated)
Total Intended Award Amount: $532,923.00
Total Awarded Amount to Date: $532,923.00
Funds Obligated to Date: FY 2020 = $532,923.00
History of Investigator:
  • Oliver Kennedy (Principal Investigator)
    okennedy@buffalo.edu
  • Atri Rudra (Co-Principal Investigator)
Recipient Sponsored Research Office: SUNY at Buffalo
520 LEE ENTRANCE STE 211
AMHERST
NY  US  14228-2577
(716)645-2634
Sponsor Congressional District: 26
Primary Place of Performance: SUNY at Buffalo
212 Capen Hall
Amherst
NY  US  14260-2500
Primary Place of Performance
Congressional District:
26
Unique Entity Identifier (UEI): LMCJKRFW5R81
Parent UEI: GMZUKXFDJMA9
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7924
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Uncertainty is prevalent in data analysis, no matter what the size of the data, the application domain, or type of analysis. Common sources of uncertainty include missing values, sensor errors, bias, outliers, and many other factors. Classical deterministic data management does not track uncertainty and, thus requires data quality issues to be resolved before data is ingested into the system, which is often not feasible. The net effect is that inherently uncertain data is being treated as certain. However, if ignored, data uncertainty results in hard to trace errors, which in turn can have severe real world implications such as unfounded scientific discoveries, financial damages, or even medical decisions based on incorrect data. While there exist techniques for managing incomplete data, these techniques are generally too heavy-weight for real-world usage and may hide relevant information from users. The goal of this project is to develop light-weight techniques for managing uncertain data that empower a wide range of applications to manage uncertainty.

Current methods for managing uncertain data are often computationally expensive and are only applicable to limited types of queries. The planned research will result in novel methods for managing uncertain data that bridge the gap between deterministic and incomplete data management. The foundation of this project are uncertainty-annotated databases, which enrich data with uncertainty labels and provide semantics for propagating these labels through queries. The result is a strict generalization of classical data management that combines the performance, generality, and ease-of-use of deterministic data management with the strong correctness guarantees of incomplete database techniques. Achieving this goal is highly non-trivial, because query evaluation over uncertain data is intractable, even for relatively simple uncertain data models and restricted classes of queries. Three main research thrusts will be explored that address the main challenges in developing such a technique: (i) uncertainty-annotated databases will be extended with attribute-level annotations and an compact encoding of an over-approximation of possible answers. This enables the approach to handle missing data and to deal with non-monotone queries such as queries with aggregation; (ii) methods to compactly approximating incomplete databases will be developed to deal with the large or even infinite sets of possible results produced by queries over uncertain data; (iii) optimized algorithms for query evaluation over uncertainty-annotated databases will be developed to address the performance limitations of queries over uncertain data. The planned work will significantly enhance the state-of-the-art in uncertain data management by, for the first time, enabling principled uncertainty management for complex queries at a reasonable cost.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Deo, Nachiket and Glavic, Boris and Kennedy, Oliver "Runtime provenance refinement for notebooks" Proceedings of the 14th International Workshop on the Theory and Practice of Provenance , 2022 https://doi.org/10.1145/3530800.3534535 Citation Details
Feng, Su and Glavic, Boris and Huber, Aaron and Kennedy, Oliver A. "Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds" SIGMOD '21: International Conference on Management of Data , 2021 https://doi.org/10.1145/3448016.3452791 Citation Details
Kennedy, Oliver and Glavic, Boris and Brachmann, Michael "Overlay Spreadsheets" HILDA '23: Proceedings of the Workshop on Human-In-the-Loop Data Analytics , 2023 https://doi.org/10.1145/3597465.3605220 Citation Details
Oliver Kennedy, Boris Glavic "The Right Tool for the Job: Data-Centric Workflows in Vizier" Bulletin of the Technical Committee on Data Engineering , v.45 , 2022 Citation Details
Pokharel, Pratik and Lee, Juseung and Kennedy, Oliver and Markatou, Marianthi and Talal, Andrew and Good, Jeff and Mukhopadhyay, Raktim "Drag, Drop, Merge: A Tool for Streamlining Integration of Longitudinal Survey Instruments" , 2024 https://doi.org/10.1145/3665939.3665965 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page