Award Abstract # 2331782
Collaborative Research: SLES: Safe Distributional-Reinforcement Learning-Enabled Systems: Theories, Algorithms, and Experiments

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: OHIO STATE UNIVERSITY, THE
Initial Amendment Date: September 12, 2023
Latest Amendment Date: September 12, 2023
Award Number: 2331782
Award Instrument: Standard Grant
Program Manager: Jie Yang
jyang@nsf.gov
 (703)292-4768
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2023
End Date: September 30, 2027 (Estimated)
Total Intended Award Amount: $375,000.00
Total Awarded Amount to Date: $375,000.00
Funds Obligated to Date: FY 2023 = $375,000.00
History of Investigator:
  • Xian Yu (Principal Investigator)
    yu.3610@osu.edu
Recipient Sponsored Research Office: Ohio State University
1960 KENNY RD
COLUMBUS
OH  US  43210-1016
(614)688-8735
Sponsor Congressional District: 03
Primary Place of Performance: Ohio State University
1960 KENNY RD
COLUMBUS
OH  US  43210-1016
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): DLWBSLWAJWR1
Parent UEI: MN4MDDMN8529
NSF Program(s): AI-Safety
Primary Program Source: 01002324DB NSF RESEARCH & RELATED ACTIVIT
4082CYXXDB NSF TRUST FUND
Program Reference Code(s):
Program Element Code(s): 248Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Reinforcement learning (RL), with its success in automation and robotics, has been widely viewed as one of the most important technologies for next-generation, learning-enabled systems. For example, 6G networking systems, autonomous driving, digital healthcare, and smart cities are all enabled by RL. However, despite the significant advances over the last few decades, a major obstacle in applying RL in practice is the lack of ?safety'' guarantees such as robustness, resilience to tail-risks, operational constraints, etc. This is because the traditional RL only aims at maximizing cumulative reward. While it is possible to add penalties to rewards in a traditional RL algorithm to discourage unsafe actions, many safety constraints, such as chance constraints, cannot be simply treated as penalties. This project develops foundational technologies for safe RL-enabled systems based on Distributional Reinforcement Learning (DRL), which learns the optimal policy. While developing the foundation of DRL for safe learning-enabled systems, research and education are integrated by including new theories and algorithms developed in this project into their graduate-level courses. All team members have been regularly supervising undergraduate students and students from underrepresented groups. The team continues to leverage Women's Place at Ohio State University and the Women in Science and Engineering Program at Arizona State University to enhance the broader participation of women students and researchers.

This project focuses on a comprehensive approach for the end-to-end safety of DRL-enabled systems. The end-to-end safety includes (i) policy safety: learn a safe policy to avoid the occurrence of catastrophic outcomes (corresponds to risk-sensitive RL); (ii) exploration safety -- learn a safe policy safely by avoiding dangerous actions during exploration/learning (corresponds to online RL); and (iii) environmental safety -- learn a policy that is robust to parametric uncertainty (environment change). This project includes four thrusts. Thrust 1 (Foundation of constrained DRL) aims to establish theoretical foundations of risk sensitive constrained DRL and focuses on policy and environmental safety. Thrust 2 (Online constrained DRL) considers safe online learning and decision-making and focuses on exploration safety and environmental safety when learning a safe DRL policy. Thrust 3 (Physics-Enhanced constrained DRL) exploits physics to enhance end-to-end safety. These three thrusts on foundational research are interdependent, but each focuses on a unique aspect of safe RL-enabled systems and addresses multiple safety notions. The fourth thrust will provide comprehensive validation with both high-fidelity simulations and real-world experiments using unmanned aerial vehicles.

This research is supported by a partnership between the National Science Foundation and Open Philanthropy.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page