Skip to feedback

Award Abstract # 1901168
III: Medium: Collaborative Research: Counterfactual Learning and Evaluation for Interactive Information Systems

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: CORNELL UNIVERSITY
Initial Amendment Date: August 7, 2019
Latest Amendment Date: December 22, 2020
Award Number: 1901168
Award Instrument: Continuing Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 15, 2019
End Date: July 31, 2024 (Estimated)
Total Intended Award Amount: $980,000.00
Total Awarded Amount to Date: $980,000.00
Funds Obligated to Date: FY 2019 = $478,857.00
FY 2020 = $501,143.00
History of Investigator:
  • Thorsten Joachims (Principal Investigator)
    tj@cs.cornell.edu
Recipient Sponsored Research Office: Cornell University
341 PINE TREE RD
ITHACA
NY  US  14850-2820
(607)255-5014
Sponsor Congressional District: 19
Primary Place of Performance: Cornell University
107 Hoy Road
Ithaca
NY  US  14853-7501
Primary Place of Performance
Congressional District:
19
Unique Entity Identifier (UEI): G56PUALJ3KT5
Parent UEI:
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT

01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7924
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Many information systems engage with their users through the following loop of interactions: the system receives a context as input (e.g. query, user profile), responds with a context-dependent action (e.g. ranking, recommendation, ad), and then receives some explicit or implicit feedback on the quality of the action (e.g. star rating, following a search result, clicking on an ad). While ubiquitous and plentiful, log data from this interaction loop does not fit the standard mold of supervised learning, since the feedback is both biased and partial -- the system determines through its actions where it gets feedback, and even for the chosen actions it typically doesn't observe all feedback (e.g. missing clicks on relevant results in ranking). This project will address the question of how this logged data can nevertheless be used for evaluating and learning new systems. The potential upsides of reusing the existing log data are evident. For evaluation, the use of historic log data enables engineers to rapidly evaluate many new systems offline (e.g. new ranking functions, recommendation policies), without the weeks of delay and the potential negative impact on user experience implied by online A/B testing. For learning, it similarly enables offline reuse of existing data instead of slowly collecting new data through an online learning algorithm. This can greatly speed up the machine-learning development cycle, since model selection, feature selection, and eventual quality control can happen offline before any learned policy gets deployed to the users. Reusing existing log data is particularly important for small-scale information systems (e.g. scholarly search), where it is often the only type of potential training data that is readily available in sufficient quantity.

The intellectual merit of the project will lie in the development of principled machine learning methods that enable information systems to reliably learn from logs of the partial and biased feedback they produce. The theoretical basis for the research lies in deep connections to counterfactual and causal inference, exploiting the analogy between logs and controlled experiments with actions as treatments and the current system as the assignment mechanism. The research builds upon recent advances in counterfactual estimators, answering the question of how a new system would have performed, if it had been used instead of the system that logged the data. The project will develop new counterfactual estimators specifically designed for the action spaces typically encountered in information systems (e.g. rankings), new propensity models, and new counterfactual policy learning algorithms that incorporate both. Finally, to validate the real-world effectiveness of the research, the project will build the Localify system, which provides local music-event recommendations and personalized playlists.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 18)
Saito, Yuta and Ren, Qingyang and Joachims, Thorsten "Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling" International Conference on Machine Learning (ICML) , 2023 Citation Details
Jeunen, Olivier and Joachims, Thorsten and Oosterhuis, Harrie and Saito, Yuta and Vasile, Flavian "CONSEQUENCES Causality, Counterfactuals and Sequential Decision-Making for Recommender Systems" ACM Conference on Recommender Systems , 2022 https://doi.org/10.1145/3523227.3547409 Citation Details
Joachims, Dr. Thorsten "An Interview with Dr. Thorsten Joachims, Winner of ACM SIGKDD 2020 Innovation Award" ACM SIGKDD Explorations Newsletter , v.22 , 2021 https://doi.org/10.1145/3447556.3447560 Citation Details
Joachims, Thorsten and London, Ben and Su, Yi and Swaminathan, Adith and Wang, Lequn "Recommendations as Treatments" AI Magazine , v.42 , 2022 https://doi.org/10.1609/aimag.v42i3.18141 Citation Details
Joachims, Thorsten and Raimond, Yves and Koch, Olivier and Dimakopoulou, Maria and Vasile, Flavian and Swaminathan, Adith "REVEAL 2020: Bandit and Reinforcement Learning from User Interactions" ACM Conference on Recommender Systems , 2020 https://doi.org/10.1145/3383313.3411536 Citation Details
Kidambi, Rahul and Rajeswaran, Aravind and Netrapalli, Praneeth and Joachims, Thorsten "MOReL: Model-Based Offline Reinforcement Learning" Advances in neural information processing systems , 2020 Citation Details
Morik, Marco and Singh, Ashudeep and Hong, Jessica and Joachims, Thorsten "Controlling Fairness and Bias in Dynamic Learning-to-Rank" Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 20) , 2020 10.1145/3397271.3401100 Citation Details
Morik, Marco and Singh, Ashudeep and Hong, Jessica and Joachims, Thorsten "Controlling Fairness and Bias in Dynamic Learning-to-Rank (Extended Abstract)" International Joint Conference on Artificial Intelligence - Best Paper Track , 2021 https://doi.org/10.24963/ijcai.2021/655 Citation Details
Sachdeva, Noveen and Su, Yi and Joachims, Thorsten "Off-policy Bandits with Deficient Support" ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 20) , 2020 10.1145/3394486.3403139 Citation Details
Saito, Yuta and Joachims, Thorsten "Counterfactual Evaluation and Learning for Interactive Systems: Foundations, Implementations, and Recent Advances" ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2022 https://doi.org/10.1145/3534678.3542601 Citation Details
Saito, Yuta and Joachims, Thorsten "Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances" ACM Conference on Recommender Systems , 2021 https://doi.org/10.1145/3460231.3473320 Citation Details
(Showing: 1 - 10 of 18)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Many information systems engage with their users through the following loop of interactions: the system receives a context as input (e.g. query, user profile), responds with a context-dependent action (e.g. ranking, recommendation, ad), and then receives some explicit or implicit feedback on the quality of the action (e.g. star rating, following a search result, clicking on an ad). While ubiquitous and plentiful, log data from this interaction loop does not fit the standard mold of supervised learning, since the feedback is both biased and partial -- the system determines through its actions where it gets feedback, and even for the chosen actions it typically doesn't observe all feedback (e.g. missing clicks on relevant results in ranking).

This project addressed the question of how this logged data can nevertheless be used for evaluating and learning new systems. The potential upsides of reusing the existing log data are evident. For evaluation, the use of historic log data enables engineers to rapidly evaluate many new systems offline (e.g. new ranking functions, recommendation policies), without the weeks of delay and the potential negative impact on user experience implied by online A/B testing. For learning, it similarly enables offline reuse of existing data instead of slowly collecting new data through an online learning algorithm. This can greatly speed up the machine-learning development cycle, since model selection, feature selection, and eventual quality control can happen offline before any learned policy gets deployed to the users. Reusing existing log data is particularly important for small-scale information systems (e.g. scholarly search), where it is often the only type of potential training data that is readily available in sufficient quantity.

The project developed principled machine learning methods that enable information systems to reliably learn from logs of the partial and biased feedback they produce. As the theoretical basis for these methods, the project uncovered connections to counterfactual and causal inference, exploiting the analogy between logs and controlled experiments with actions as treatments and the current system as the assignment mechanism. In this way, the project developed new ways for answering the question of how a new system would have performed, if it had been used instead of the system that logged the data. In particular, the project developed new counterfactual estimators specifically designed for the action spaces typically encountered in information systems (e.g. rankings), new propensity models, and new counterfactual policy learning algorithms that incorporate both. To validate the real-world effectiveness of the research, the project built the Localify system (https://localify.org/), which provides local music-event recommendations and personalized playlists. 

In addition to producing new methods and changing how learning from partial-information feedback is done for many online platforms in industry, the project helped grow a research community around off-policy learning and evaluation. In particular, the PI organized the REVEAL/CONSEQUENCES workshops at RecSys 2019, 2020, 2021, 2022, and 2023, and by now off-policy learning and evaluation have become mainstream with its own conference session at RecSys 2024. Furthermore, the project offered research and career developments to a diverse group of undergraduate and graduate students.


 

 


Last Modified: 12/06/2024
Modified by: Thorsten Joachims

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page