
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 7, 2019 |
Latest Amendment Date: | December 22, 2020 |
Award Number: | 1901168 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 15, 2019 |
End Date: | July 31, 2024 (Estimated) |
Total Intended Award Amount: | $980,000.00 |
Total Awarded Amount to Date: | $980,000.00 |
Funds Obligated to Date: |
FY 2020 = $501,143.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
341 PINE TREE RD ITHACA NY US 14850-2820 (607)255-5014 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
107 Hoy Road Ithaca NY US 14853-7501 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01002021DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT 01002223DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Many information systems engage with their users through the following loop of interactions: the system receives a context as input (e.g. query, user profile), responds with a context-dependent action (e.g. ranking, recommendation, ad), and then receives some explicit or implicit feedback on the quality of the action (e.g. star rating, following a search result, clicking on an ad). While ubiquitous and plentiful, log data from this interaction loop does not fit the standard mold of supervised learning, since the feedback is both biased and partial -- the system determines through its actions where it gets feedback, and even for the chosen actions it typically doesn't observe all feedback (e.g. missing clicks on relevant results in ranking). This project will address the question of how this logged data can nevertheless be used for evaluating and learning new systems. The potential upsides of reusing the existing log data are evident. For evaluation, the use of historic log data enables engineers to rapidly evaluate many new systems offline (e.g. new ranking functions, recommendation policies), without the weeks of delay and the potential negative impact on user experience implied by online A/B testing. For learning, it similarly enables offline reuse of existing data instead of slowly collecting new data through an online learning algorithm. This can greatly speed up the machine-learning development cycle, since model selection, feature selection, and eventual quality control can happen offline before any learned policy gets deployed to the users. Reusing existing log data is particularly important for small-scale information systems (e.g. scholarly search), where it is often the only type of potential training data that is readily available in sufficient quantity.
The intellectual merit of the project will lie in the development of principled machine learning methods that enable information systems to reliably learn from logs of the partial and biased feedback they produce. The theoretical basis for the research lies in deep connections to counterfactual and causal inference, exploiting the analogy between logs and controlled experiments with actions as treatments and the current system as the assignment mechanism. The research builds upon recent advances in counterfactual estimators, answering the question of how a new system would have performed, if it had been used instead of the system that logged the data. The project will develop new counterfactual estimators specifically designed for the action spaces typically encountered in information systems (e.g. rankings), new propensity models, and new counterfactual policy learning algorithms that incorporate both. Finally, to validate the real-world effectiveness of the research, the project will build the Localify system, which provides local music-event recommendations and personalized playlists.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Many information systems engage with their users through the following loop of interactions: the system receives a context as input (e.g. query, user profile), responds with a context-dependent action (e.g. ranking, recommendation, ad), and then receives some explicit or implicit feedback on the quality of the action (e.g. star rating, following a search result, clicking on an ad). While ubiquitous and plentiful, log data from this interaction loop does not fit the standard mold of supervised learning, since the feedback is both biased and partial -- the system determines through its actions where it gets feedback, and even for the chosen actions it typically doesn't observe all feedback (e.g. missing clicks on relevant results in ranking).
This project addressed the question of how this logged data can nevertheless be used for evaluating and learning new systems. The potential upsides of reusing the existing log data are evident. For evaluation, the use of historic log data enables engineers to rapidly evaluate many new systems offline (e.g. new ranking functions, recommendation policies), without the weeks of delay and the potential negative impact on user experience implied by online A/B testing. For learning, it similarly enables offline reuse of existing data instead of slowly collecting new data through an online learning algorithm. This can greatly speed up the machine-learning development cycle, since model selection, feature selection, and eventual quality control can happen offline before any learned policy gets deployed to the users. Reusing existing log data is particularly important for small-scale information systems (e.g. scholarly search), where it is often the only type of potential training data that is readily available in sufficient quantity.
The project developed principled machine learning methods that enable information systems to reliably learn from logs of the partial and biased feedback they produce. As the theoretical basis for these methods, the project uncovered connections to counterfactual and causal inference, exploiting the analogy between logs and controlled experiments with actions as treatments and the current system as the assignment mechanism. In this way, the project developed new ways for answering the question of how a new system would have performed, if it had been used instead of the system that logged the data. In particular, the project developed new counterfactual estimators specifically designed for the action spaces typically encountered in information systems (e.g. rankings), new propensity models, and new counterfactual policy learning algorithms that incorporate both. To validate the real-world effectiveness of the research, the project built the Localify system (https://localify.org/), which provides local music-event recommendations and personalized playlists.
In addition to producing new methods and changing how learning from partial-information feedback is done for many online platforms in industry, the project helped grow a research community around off-policy learning and evaluation. In particular, the PI organized the REVEAL/CONSEQUENCES workshops at RecSys 2019, 2020, 2021, 2022, and 2023, and by now off-policy learning and evaluation have become mainstream with its own conference session at RecSys 2024. Furthermore, the project offered research and career developments to a diverse group of undergraduate and graduate students.
Last Modified: 12/06/2024
Modified by: Thorsten Joachims
Please report errors in award information by writing to: awardsearch@nsf.gov.