
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 4, 2019 |
Latest Amendment Date: | September 4, 2019 |
Award Number: | 1943971 |
Award Instrument: | Standard Grant |
Program Manager: |
Raj Acharya
racharya@nsf.gov (703)292-7978 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2019 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $299,317.00 |
Total Awarded Amount to Date: | $299,317.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
101 COMMONWEALTH AVE AMHERST MA US 01003-9252 (413)545-0698 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
100 Venture Way, Suite 201 Hadley MA US 01035-9450 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Prescriptive analytics, and constrained optimization in particular, is central to decision making over a broad range of domains, including finance, transportation, manufacturing, and healthcare, and has applications to scientific research as well. Typically, decision makers have relied on application-specific solutions to model and solve these problems. Such solutions are often complex and do not generalize. Moreover, the usual workflow requires that data be extracted from a database and then reformatted and fed into a separate optimization package, after which the output must be reformatted and inserted back into the database; this process is slow, cumbersome, and error prone. Finally, modern data-intensive optimization problems are of unprecedented size. A domain-independent, declarative, and scalable approach is needed, supported and powered by the system where the data relevant to these problems typically resides: the database. Then modeling becomes less ad hoc, and the overall optimization process, from data preparation through solution and exploration of results, becomes much more efficient. Desirable data management functionality --- such as consistency, persistence, fault tolerance, access control, and data-integration capability --- become an integral part of the system "for free". This project will develop algorithms and systems to provide general-purpose in-database support for prescriptive analytics applications over the sort of large scale uncertain data that is commonly encountered in practice.
Specifically, the project will develop extensions to the SQL relational query language to allow specification of ``stochastic package queries'', a class of database queries that selects an optimal set ("package") of tuples that satisfy both per-tuple and global constraints. Such queries correspond to stochastic integer linear programs. Novel solution algorithms will focus on the scaling challenges caused both by uncertainty in the data and by large data volumes. The system will provide exact solutions when possible, and otherwise provide scalable Monte-Carlo-based solutions with rigorous approximation guarantees. The project will radically re-design the prior PackageBuilder system for deterministic package queries, incorporating techniques from probabilistic databases, to create a complete end-to-end system. The project will impact a broad set of domains with applications that boil down to modeling and solving constrained optimization problems over uncertain data, including finance, healthcare, and transportation.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Core problems in a broad range of domains, such as finance, manufacturing, and medicine, are modeled as constrained optimization problems. Earlier work on prescriptive analytics developed foundations for in-database support for such applications, allowing for declarative and domain-independent specification, as well as algorithms for scalable evaluation. This project significantly extended this support to handle the large-scale uncertain data that is commonly encountered in practice.
Intellectual Merit: The project developed extensions to the SQL relational query language to allow specification of "stochastic package queries", a class of database queries that selects an optimal set ("package") of tuples that satisfy both per-tuple and global constraints. Such queries correspond to stochastic integer linear programs. The produced algorithms focused on the scaling challenges caused both by uncertainty in the data and by large data volumes. The project radically re-designed the prior PackageBuilder system for deterministic package queries, incorporating techniques from probabilistic databases, to create a complete end-to-end system.
Broader Impacts: The project made significant contributions on the topic of in-database prescriptive analytics, simplifying workflows and facilitating domain experts' access to general purpose tools. The project supported two PhD students at the University of Massachusetts Amherst, and further helped train one MS student and three undergraduates. Results from this project have been published in premier data management venues and have been recognized with best demonstration awards at VLDB 2020.
Last Modified: 11/04/2023
Modified by: Peter J Haas
Please report errors in award information by writing to: awardsearch@nsf.gov.