
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | June 16, 2020 |
Latest Amendment Date: | June 16, 2020 |
Award Number: | 2007834 |
Award Instrument: | Standard Grant |
Program Manager: |
Alfred Hero
ahero@nsf.gov (703)292-0000 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2020 |
End Date: | September 30, 2024 (Estimated) |
Total Intended Award Amount: | $500,000.00 |
Total Awarded Amount to Date: | $500,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
5000 FORBES AVE PITTSBURGH PA US 15213-3890 (412)268-8746 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
5000 Forbes Avenue Pittsburgh PA US 15213-3890 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Special Projects - CCF, Comm & Information Foundations |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Learning from big data has been revolutionizing inference and decision-making, and yet several important applications fall in the small data regime. In this regime, obtaining training samples can be expensive, slow or even hazardous. Thus, there is a critical need for enabling sequential decision-making and inference as data from sequential samples is received over time. This project develops new methods to improve the efficiency and accuracy of sequential decision-making and inference. It will demonstrate the impact of expected outcomes via rigorous and targeted evaluation in applications such as content recommendation systems, clinical trials, distributed machine learning, and hyperparameter tuning. The research outcomes will be published to broad academic and professional audiences and incorporated into teaching curricula via graduate and undergraduate courses. The project will encourage a diverse group of students to participate in research. Through industry partnerships, outcomes of this research will be transitioned quickly to practice.
Multi-armed bandit algorithms, which aim to maximize the cumulative reward or identify the best option among a set of choices (referred to as arms), are naturally suited for problems involving sequential decision-making. However, most of the work on multi-armed bandit algorithms assumes independence of the rewards across arms. The objective of the proposed project is to exploit known latent structures and correlation between arms to drastically reduce the sample complexity of multi-armed bandit algorithms. In particular, the investigators aim to design sample-efficient algorithms for two different frameworks: i) the structured bandit framework, where the rewards depend on a common latent feature vector, and ii) a novel correlated bandit framework where reward realizations from arms are correlated with each other. In both frameworks, the project will result in the design of algorithms to maximize the cumulative reward (the exploration-exploitation problem) and to identify the best action/arm as fast as possible (the pure exploration problem). This will be done through a novel and easily generalizable approach to account for the available information on the structure or the correlation among arms to boost the performance of decision-making and inference.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Although learning from "big data" has revolutionized inference and decision-making from indirect or noisy samples, several important and relevant applications fall in the small data regime where obtaining training samples is expensive, slow, or even hazardous. Several engineering applications suffer from this scarcity of training data -- for example, medical trials to identify the best treatment option, price optimization based on targeted marketing campaigns, or hyperparameter tuning in machine learning. Standard supervised learning methods such as deep neural networks, which rely on massive amounts of training data, can give highly inaccurate results when used in data-scarce applications. Thus, there is a critical need for sample-efficient algorithms for decision-making and inference using carefully drawn sequential samples.
This work considered correlated and structured versions of the classic multi-armed bandit (MAB) framework, which applies to sequential decision-making in such small data regimes. In both correlated and structured bandits, we derived algorithms i) to maximize the cumulative reward (exploration-exploitation problem); and ii) to identify the best action/arm as fast as possible (pure exploration problem). The key impact of the project is that it presents a novel and easily generalizable approach to account for the available information on the structure or the correlation among arms to boost the performance of decision-making and inference. This approach enabled the extension of any current/future MAB algorithm to structured and correlated MABs. The project used a unique combination of mathematical tools from multi-armed bandits, reinforcement learning, scheduling, performance modeling, and information theory and coding, and it bolstered the cross-pollination of ideas between these fields.
The results were published at top conferences and journals in machine learning and networking such as the International Conference on Machine Learning (ICML), IEEE Transactions on Information Theory.
Last Modified: 01/30/2025
Modified by: Gauri Joshi
Please report errors in award information by writing to: awardsearch@nsf.gov.