Award Abstract # 1943971
EAGER: In-Database Prescriptive Analytics under Uncertainty

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF MASSACHUSETTS
Initial Amendment Date: September 4, 2019
Latest Amendment Date: September 4, 2019
Award Number: 1943971
Award Instrument: Standard Grant
Program Manager: Raj Acharya
racharya@nsf.gov
 (703)292-7978
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2019
End Date: September 30, 2023 (Estimated)
Total Intended Award Amount: $299,317.00
Total Awarded Amount to Date: $299,317.00
Funds Obligated to Date: FY 2019 = $299,317.00
History of Investigator:
  • Peter Haas (Principal Investigator)
    phaas@cs.umass.edu
  • Alexandra Meliou (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Massachusetts Amherst
101 COMMONWEALTH AVE
AMHERST
MA  US  01003-9252
(413)545-0698
Sponsor Congressional District: 02
Primary Place of Performance: University of Massachusetts Amherst
100 Venture Way, Suite 201
Hadley
MA  US  01035-9450
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): VGJHK59NMPK9
Parent UEI: VGJHK59NMPK9
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7916, 7484
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Prescriptive analytics, and constrained optimization in particular, is central to decision making over a broad range of domains, including finance, transportation, manufacturing, and healthcare, and has applications to scientific research as well. Typically, decision makers have relied on application-specific solutions to model and solve these problems. Such solutions are often complex and do not generalize. Moreover, the usual workflow requires that data be extracted from a database and then reformatted and fed into a separate optimization package, after which the output must be reformatted and inserted back into the database; this process is slow, cumbersome, and error prone. Finally, modern data-intensive optimization problems are of unprecedented size. A domain-independent, declarative, and scalable approach is needed, supported and powered by the system where the data relevant to these problems typically resides: the database. Then modeling becomes less ad hoc, and the overall optimization process, from data preparation through solution and exploration of results, becomes much more efficient. Desirable data management functionality --- such as consistency, persistence, fault tolerance, access control, and data-integration capability --- become an integral part of the system "for free". This project will develop algorithms and systems to provide general-purpose in-database support for prescriptive analytics applications over the sort of large scale uncertain data that is commonly encountered in practice.

Specifically, the project will develop extensions to the SQL relational query language to allow specification of ``stochastic package queries'', a class of database queries that selects an optimal set ("package") of tuples that satisfy both per-tuple and global constraints. Such queries correspond to stochastic integer linear programs. Novel solution algorithms will focus on the scaling challenges caused both by uncertainty in the data and by large data volumes. The system will provide exact solutions when possible, and otherwise provide scalable Monte-Carlo-based solutions with rigorous approximation guarantees. The project will radically re-design the prior PackageBuilder system for deterministic package queries, incorporating techniques from probabilistic databases, to create a complete end-to-end system. The project will impact a broad set of domains with applications that boil down to modeling and solving constrained optimization problems over uncertain data, including finance, healthcare, and transportation.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
Glavic, Boris and Meliou, Alexandra and Roy, Sudeepa "Trends in Explanations: Understanding and Debugging Data-driven Systems" Foundations and Trends® in Databases , v.11 , 2021 https://doi.org/10.1561/1900000074 Citation Details
Galhotra, Sainyam and Fariha, Anna and Lourenço, Raoni and Freire, Juliana and Meliou, Alexandra and Srivastava, Divesh "DataPrism: Exposing Disconnect between Data and Systems" Proceedings of the 2022 International Conference on Management of Data (SIGMOD) , 2022 https://doi.org/10.1145/3514221.3517864 Citation Details
Addanki, Raghavendra and McGregor, Andrew and Meliou, Alexandra and Moumoulidou, Zafeiria "Improved Approximation and Scalability for Fair Max-Min Diversification" 25th International Conference on Database Theory (ICDT) , 2022 https://doi.org/10.4230/LIPIcs.ICDT.2022.7 Citation Details
Azza Abouzied and Peter J. Haas and Alexandra Meliou "In-Database Decision Support: Opportunities and Challenges" A Quarterly bulletin of the IEEE Computer Society Technical Committee on Database Engineering , v.45 , 2022 Citation Details
Brucato, Matteo and Mannino, Miro and Abouzied, Azza and Haas, Peter J. and Meliou, Alexandra "sPaQLTooLs: a stochastic package query interface for scalable constrained optimization" Proceedings of the VLDB Endowment , v.13 , 2020 https://doi.org/10.14778/3415478.3415499 Citation Details
Brucato, Matteo and Yadav, Nishant and Abouzied, Azza and Haas, Peter J. and Meliou, Alexandra "Stochastic Package Queries in Probabilistic Databases" Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) , 2020 10.1145/3318464.3389765 Citation Details
Fariha, Anna and Brucato, Matteo and Haas, Peter J. and Meliou, Alexandra "SuDocu: summarizing documents by example" Proceedings of the VLDB Endowment , v.13 , 2020 https://doi.org/10.14778/3415478.3415494 Citation Details
Moumoulidou, Zafeiria and McGregor, Andrew "Diverse Data Selection under Fairness Constraints" International Conference on Database Theory , v.186 , 2021 https://doi.org/10.4230/LIPIcs.ICDT.2021.13 Citation Details
Yadav, Nishant and Brucato, Matteo and Fariha, Anna and Youngquist, Oscar and Killingback, Julian and Meliou, Alexandra and Haas, Peter "SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents" New Frontiers in Summarization workshop (at EMNLP 2021) , 2021 https://doi.org/10.18653/v1/2021.newsum-1.14 Citation Details
Mai, Anh L and Wang, Pengyu and Abouzied, Azza and Brucato, Matteo and Haas, Peter J and Meliou, Alexandra "Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization" Proceedings of the VLDB Endowment , v.17 , 2024 https://doi.org/10.14778/3641204.3641222 Citation Details
Islam, Maliha Tashfia and Fariha, Anna and Meliou, Alexandra and Salimi, Babak "Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification" Proceedings of the 2022 International Conference on Management of Data (SIGMOD) , 2022 https://doi.org/10.1145/3514221.3517841 Citation Details
(Showing: 1 - 10 of 11)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Core problems in a broad range of domains, such as finance, manufacturing, and medicine, are modeled as constrained optimization problems. Earlier work on prescriptive analytics developed foundations for in-database support for such applications, allowing for declarative and domain-independent specification, as well as algorithms for scalable evaluation.  This project significantly extended this support to handle the large-scale uncertain data that is commonly encountered in practice.


Intellectual Merit: The project developed extensions to the SQL relational query language to allow specification of "stochastic package queries", a class of database queries that selects an optimal set ("package") of tuples that satisfy both per-tuple and global constraints. Such queries correspond to stochastic integer linear programs. The produced algorithms focused on the scaling challenges caused both by uncertainty in the data and by large data volumes. The project radically re-designed the prior PackageBuilder system for deterministic package queries, incorporating techniques from probabilistic databases, to create a complete end-to-end system. 


Broader Impacts: The project made significant contributions on the topic of in-database prescriptive analytics, simplifying workflows and facilitating domain experts' access to general purpose tools.  The project supported two PhD students at the University of Massachusetts Amherst, and further helped train one MS student and three undergraduates.  Results from this project have been published in premier data management venues and have been recognized with best demonstration awards at VLDB 2020.


Last Modified: 11/04/2023
Modified by: Peter J Haas

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page