Award Abstract # 2402873
Collaborative Research: III: Medium: Retrieval-Enhanced Machine Learning Through an Information Retrieval Lens

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF MASSACHUSETTS
Initial Amendment Date: August 12, 2024
Latest Amendment Date: March 24, 2025
Award Number: 2402873
Award Instrument: Continuing Grant
Program Manager: Cornelia Caragea
ccaragea@nsf.gov
 (703)292-2706
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2024
End Date: September 30, 2027 (Estimated)
Total Intended Award Amount: $769,633.00
Total Awarded Amount to Date: $612,970.00
Funds Obligated to Date: FY 2024 = $612,970.00
History of Investigator:
  • Hamed Zamani (Principal Investigator)
    zamani@cs.umass.edu
  • Mohit Iyyer (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of Massachusetts Amherst
101 COMMONWEALTH AVE
AMHERST
MA  US  01003-9252
(413)545-0698
Sponsor Congressional District: 02
Primary Place of Performance: University of Massachusetts Amherst
COMMONWEALTH AVE
AMHERST
MA  US  01003-9346
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): VGJHK59NMPK9
Parent UEI: VGJHK59NMPK9
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01002526DB NSF RESEARCH & RELATED ACTIVIT
01002425DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7924
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Retrieval-Enhanced Machine Learning (REML) refers to a subset of machine learning models that make predictions by utilizing the results of one or more retrieval models from collections of documents. REML has recently attracted considerable attention due to its wide range of applications, including knowledge grounding for question answering and improving generalization in large language models. However, REML has mainly been studied from a machine learning perspective, without focusing on the retrieval aspects. Preliminary explorations have demonstrated the importance of retrieval on downstream REML performance. This observation has motivated this project in order to provide an alternative view to REML and study REML from an information retrieval (IR) perspective. In this perspective, the retrieval component in REML is framed as a search engine capable of supporting multiple, independent predictive models, as opposed to a single predictive model as is the case in the majority of existing work.

This project consists of three major research thrusts. First, the project will develop novel architectures and optimization solutions that provide information access to multiple machine learning models conducting a wide variety of tasks. Next, the project will study training and inference efficiency in the context of REML by focusing on the utilization of retrieval results by downstream machine learning models and the feedback they provide. Third, the project will study approaches for responsible REML by examining data control for content providers in REML and fairness and robustness across multiple downstream models. Without loss of generality, the project will primarily focus on a number of real-world language tasks, such as open-domain question answering, fact verification, and open-domain dialogue systems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page