Award Abstract # 2007834
CIF: Small: Efficient Sequential Decision-Making and Inference in the Small Data Regime

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: CARNEGIE MELLON UNIVERSITY
Initial Amendment Date: June 16, 2020
Latest Amendment Date: June 16, 2020
Award Number: 2007834
Award Instrument: Standard Grant
Program Manager: Alfred Hero
ahero@nsf.gov
 (703)292-0000
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2020
End Date: September 30, 2024 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2020 = $500,000.00
History of Investigator:
  • Gauri Joshi (Principal Investigator)
    gaurij@andrew.cmu.edu
  • Osman Yagan (Co-Principal Investigator)
Recipient Sponsored Research Office: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3890
(412)268-8746
Sponsor Congressional District: 12
Primary Place of Performance: Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh
PA  US  15213-3890
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): U3NKNFLNQ613
Parent UEI: U3NKNFLNQ613
NSF Program(s): Special Projects - CCF,
Comm & Information Foundations
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 075Z, 7923, 7936, 9102
Program Element Code(s): 287800, 779700
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Learning from big data has been revolutionizing inference and decision-making, and yet several important applications fall in the small data regime. In this regime, obtaining training samples can be expensive, slow or even hazardous. Thus, there is a critical need for enabling sequential decision-making and inference as data from sequential samples is received over time. This project develops new methods to improve the efficiency and accuracy of sequential decision-making and inference. It will demonstrate the impact of expected outcomes via rigorous and targeted evaluation in applications such as content recommendation systems, clinical trials, distributed machine learning, and hyperparameter tuning. The research outcomes will be published to broad academic and professional audiences and incorporated into teaching curricula via graduate and undergraduate courses. The project will encourage a diverse group of students to participate in research. Through industry partnerships, outcomes of this research will be transitioned quickly to practice.

Multi-armed bandit algorithms, which aim to maximize the cumulative reward or identify the best option among a set of choices (referred to as arms), are naturally suited for problems involving sequential decision-making. However, most of the work on multi-armed bandit algorithms assumes independence of the rewards across arms. The objective of the proposed project is to exploit known latent structures and correlation between arms to drastically reduce the sample complexity of multi-armed bandit algorithms. In particular, the investigators aim to design sample-efficient algorithms for two different frameworks: i) the structured bandit framework, where the rewards depend on a common latent feature vector, and ii) a novel correlated bandit framework where reward realizations from arms are correlated with each other. In both frameworks, the project will result in the design of algorithms to maximize the cumulative reward (the exploration-exploitation problem) and to identify the best action/arm as fast as possible (the pure exploration problem). This will be done through a novel and easily generalizable approach to account for the available information on the structure or the correlation among arms to boost the performance of decision-making and inference.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
Choudhury, Tuhinangshu and Joshi, Gauri and Wang, Weina and Shakkottai, Sanjay "Job Dispatching Policies for Queueing Systems with Unknown Service Rates" ACM MobiHoc: International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks , 2021 https://doi.org/10.1145/3466772.3467047 Citation Details
Choudhury, Tuhinangshu and Wang, Weina and Joshi, Gauri "Tackling heterogeneous traffic in multi-access systems via erasure coded servers" Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing , 2022 https://doi.org/10.1145/3492866.3549713 Citation Details
Elumar, Eray Can and Tekin, Cem and Yaan, Osman "Multi-Armed Bandits With Costly Probes" IEEE Transactions on Information Theory , v.71 , 2025 https://doi.org/10.1109/TIT.2024.3506866 Citation Details
Elumar, Eray Can and Tekin, Cem and Yaan, Osman "Multi-Armed Bandits with Probing" , 2024 https://doi.org/10.1109/ISIT57864.2024.10619238 Citation Details
Gupta, Samarth and Chaudhari, Shreyas and Joshi, Gauri and Yagan, Osman "Multi-Armed Bandits with Correlated Arms" IEEE Transactions on Information Theory , 2021 https://doi.org/10.1109/TIT.2021.3081508 Citation Details
Gupta, Samarth and Chaudhari, Shreyas and Mukherjee, Subhojyoti and Joshi, Gauri and Yagan, Osman "A Unified Approach to Translate Classical Bandit Algorithms to Structured Bandits" ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2021 https://doi.org/10.1109/ICASSP39728.2021.9413628 Citation Details
Gupta, Samarth and Chaudhari, Shreyas and Mukherjee, Subhojyoti and Joshi, Gauri and Yagan, Osman "A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting" IEEE Journal on Selected Areas in Information Theory , v.1 , 2020 https://doi.org/10.1109/JSAIT.2020.3041246 Citation Details
Gupta, Samarth and Joshi, Gauri and Yagan, Osman "Best-Arm Identification in Correlated Multi-Armed Bandits" IEEE Journal on Selected Areas in Information Theory , v.2 , 2021 https://doi.org/10.1109/JSAIT.2021.3082028 Citation Details
Gupta, Samarth and Zuo, Jinhang and Joe-Wong, Carlee and Joshi, Gauri and Yagan, Osman "Correlated Combinatorial Bandits for Online Resource Allocation" ACM International Symposium on Mobile Ad Hoc Networking and Computing , 2022 https://doi.org/10.1145/3492866.3549727 Citation Details
Khodadadian, Sajad and Sharma, Pranay and Joshi, Gauri and Maguluri Siva Theja "Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling" International Conference on Machine Learning (ICML) , 2022 Citation Details
Woo, Jiin and Joshi, Gauri and Chi, Yuejie "The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond" International Conference on Machine Learning (ICML) , 2023 Citation Details
(Showing: 1 - 10 of 11)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Although learning from "big data" has revolutionized inference and decision-making from indirect or noisy samples, several important and relevant applications fall in the small data regime where obtaining training samples is expensive, slow, or even hazardous. Several engineering applications suffer from this scarcity of training data -- for example, medical trials to identify the best treatment option, price optimization based on targeted marketing campaigns, or hyperparameter tuning in machine learning. Standard supervised learning methods such as deep neural networks, which rely on massive amounts of training data, can give highly inaccurate results when used in data-scarce applications. Thus, there is a critical need for sample-efficient algorithms for decision-making and inference using carefully drawn sequential samples. 

 

This work considered correlated and structured versions of the classic multi-armed bandit (MAB) framework, which applies to sequential decision-making in such small data regimes. In both correlated and structured bandits, we derived algorithms i) to maximize the cumulative reward (exploration-exploitation problem); and ii) to identify the best action/arm as fast as possible (pure exploration problem). The key impact of the project is that it presents a novel and easily generalizable approach to account for the available information on the structure or the correlation among arms to boost the performance of decision-making and inference. This approach enabled the extension of any current/future MAB algorithm to structured and correlated MABs. The project used a unique combination of mathematical tools from multi-armed bandits, reinforcement learning, scheduling, performance modeling, and information theory and coding, and it bolstered the cross-pollination of ideas between these fields.

 

The results were published at top conferences and journals in machine learning and networking such as the International Conference on Machine Learning (ICML), IEEE Transactions on Information Theory.


 

 


Last Modified: 01/30/2025
Modified by: Gauri Joshi

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page