NSF Award Search: Award # 1836819

Award Abstract # 1836819

CPS: Medium: Collaborative Research: Certifiable reinforcement learning for cyber-physical systems

NSF Org:	ECCS Division of Electrical, Communications and Cyber Systems
Recipient:	UNIVERSITY OF WASHINGTON
Initial Amendment Date:	August 29, 2018
Latest Amendment Date:	July 20, 2021
Award Number:	1836819
Award Instrument:	Standard Grant
Program Manager:	Richard Nash rnash@nsf.gov (703)292-5394 ECCS Division of Electrical, Communications and Cyber Systems ENG Directorate for Engineering
Start Date:	September 15, 2018
End Date:	August 31, 2022 (Estimated)
Total Intended Award Amount:	$666,254.00
Total Awarded Amount to Date:	$682,254.00
Funds Obligated to Date:	FY 2018 = $666,254.00 FY 2021 = $16,000.00
History of Investigator:	Sam Burden (Principal Investigator) sburden@uw.edu Lillian Ratliff (Co-Principal Investigator)
Recipient Sponsored Research Office:	University of Washington 4333 BROOKLYN AVE NE SEATTLE WA US 98195-1016 (206)543-4043
Sponsor Congressional District:	07
Primary Place of Performance:	University of Washington 4333 Brooklyn Ave NE Seattle WA US 98195-0001
Primary Place of Performance Congressional District:	07
Unique Entity Identifier (UEI):	HD1WMN6945W6
Parent UEI:
NSF Program(s):	EPCN-Energy-Power-Ctrl-Netwrks, CPS-Cyber-Physical Systems
Primary Program Source:	01001819DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	102Z, 1653, 7918, 9102, 9251
Program Element Code(s):	760700, 791800
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.041

ABSTRACT

We propose to generalize and certify the performance of reinforcement learning algorithms for control of cyber-physical systems (CPS). Broadly speaking, reinforcement learning applied to physical systems is concerned with making predictions from data to control the system to extremize a performance criterion. The project will particularly focus on developing theory and algorithms applicable to hybrid and multi-agent control systems, that is, systems with continuous and discrete elements and systems with multiple decision-making agents, which are ubiquitous in CPS across spatiotemporal scales and application domains. Reinforcement learning algorithms are not yet mature enough to guarantee performance when applied to control of CPS. In light of these limitations, this project aims to lay the theoretical and computational foundation to certify reinforcement learning algorithms so that they may be deployed in society with high confidence.

This project will certify reinforcement learning algorithms that compute optimal control policies in systems with non-classical dynamics and non-classical costs. To achieve this goal, we will generalize convergent algorithms originally designed for purely continuous systems to apply in hybrid control systems whose states undergo a mixture of discrete and continuous transitions. Moreover, we specifically aim to ensure this approach is applicable to societal-scale CPS in which multiple agents, some of which may be humans, interact directly with the CPS. These algorithms will be experimentally validated on three testbeds that represent a range of hybrid and multi-agent phenomena that arise in CPS. The first testbed will test the performance of our algorithms on societal-scale traffic flow networks via simulation. The second testbed will consider heterogeneous teams of aerial and terrestrial mobile robots collaborating with human partners to perform construction, inspection, and maintenance tasks on scale facsimiles of infrastructure like bridges, and tunnels. The third testbed will study the closed-loop interaction between individual humans and remote, teleoperated robots that perform dynamic locomotion and manipulation behaviors. This project will also co-organize an interdisciplinary workshop with technology policy experts, the results of which will form the basis for an interdisciplinary multi-campus graduate-level seminar run by the PIs.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Banjanin, Bora S. and Burden, Samuel A. "Nonsmooth Optimal Value and Policy Functions in Mechanical Systems Subject to Unilateral Constraints" IEEE Control Systems Letters , v.4 , 2020 https://doi.org/10.1109/LCSYS.2019.2960442 Citation Details

Chasnov, B and Ratliff, L and Mazumdar, E and Burden, S "Convergence Analysis of Gradient-Based Learning in Continuous Games" Proceedings of The 35th Uncertainty in Artificial Intelligence Conference , v.115 , 2020 Citation Details

Chasnov, Benjamin and Yamagami, Momona and Parsa, Behnoosh and Ratliff, Lillian J. and Burden, Samuel A. "Experiments with sensorimotor games in dynamic human/machine interaction" Micro- and Nanotechnology Sensors, Systems, and Applications XI , 2019 10.1117/12.2519258 Citation Details

Feiz, Tanner and Chasnov, Benjamin and Ratliff, Lillian J. "Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study" International Conference on Machine Learning , 2020 Citation Details

Yamagami, Momona and Steele, Katherine M. and Burden, Samuel A. "Decoding Intent With Control Theory: Comparing Muscle Versus Manual Interface Performance" CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , 2020 https://doi.org/10.1145/3313831.3376224 Citation Details

Zhang, Jize and Pace, Andrew M. and Burden, Samuel A. and Aravkin, Aleksandr "Offline state estimation for hybrid systems via nonsmooth variable projection" Automatica , v.115 , 2020 10.1016/j.automatica.2020.108871 Citation Details

Zheng, Liyuan and Fiez, Tanner and Alumbaugh, Zane and Chasnov, Benjamin and Ratliff, Lillian J. "Stackelberg Actor-Critic: A Game-Theoretic Perspective" AAAI Workshop on Reinforcement Learning and Games , 2021 Citation Details

Zheng, Liyuan and Fiez, Tanner and Alumbaugh, Zane and Chasnov, Benjamin and Ratliff, Lillian J. "Stackelberg Actor-Critic: Game theoretic reinforcement learning algorithms" Proceedings of the AAAI Conference on Artificial Intelligence , 2022 Citation Details

Zheng, Liyuan and Shi, Yuanyuan and Ratliff, Lillian J and Zhang, Baosen "Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks" Conference on Learning for Dynamics and Control , 2021 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project aimed to generalize and certify the performance of reinforcementlearning algorithms for control of cyber-physical systems (CPS). In particular,outcomes of the project including new theory, metrics, and methods for certifying algorithms that compute optimal control policies in systems with non-classical dynamics and non-classical costs. Broadly speaking, reinforcement learning applied to physical systems is concerned with making predictions from data to control the system to extremize a performance criterion. In this context, this project particularly focused on developing theory and algorithms applicable to hybrid and multi-agent control systems, that is, systems with continuous and discrete elements and systems with multiple decision-making agents, which are ubiquitous in CPS acrosss patio-temporal scales and application domains. While arguably reinforcement learning algorithms are not yet mature enough to guarantee performance when applied to control of complex CPS with learning enable components, this project made great strides in laying the theoretical and computational foundation to certify reinforcement learning algorithms so that they may be deployed in society with high confidence.

To achieve this goal, the research team generalized convergent algorithms originally designed for purely continuous systems to apply in hybrid controlsystems whose states undergo a mixture of discrete and continuous transitions. Computationally efficient methods for ensuring additional desiderata such as saftey were developed. Moreover, the team developed techniques that targeted societal-scale CPS inwhich multiple agents, some of which may be humans, interact directly with the CPS. The developed algorithms and theoretical models were experimentally validated on three testbeds that represent a range of hybrid and multi-agent phenomena arising in CPS. The first testbed focuses on the performance of ouralgorithms on societal-scale traffic flow networks via simulation. The second testbed concerns heterogeneous teams of aerial and terrestrial mobile robots collaborating with human partners to perform construction, inspection, and maintenance tasks on scale facsimiles of infrastructure like bridges, andtunnels. The third testbed comprises both simulation and real-world experimental studies focused on the closed-loop interaction between individual humans and learning based systems. The latter test bed is motivated by CPS suchas teleoperated robots that perform dynamic locomotion and manipulation behaviors, human-in-the-loop societal scale CPS such as intelligent transportation, and multi-agent reinforcement learning broadly. Overall the project resulted in a wide range of results from fundamental new theory on non-smooth dynamical systems and multi-agent interactions to practically implementable algorithms to experimental validation in a variety of CPS context.

The research team included not just the PIs and graduate students, but a numberof undergraduate researchers funded through REUs who contributed significantlyto the project, especially the simulation-based test beds which are now publiclyavailable for the community to use. Finally, the team disseminated the results to a range of different communities including industry, policy makers, governing bodies, and academia.

Last Modified: 01/14/2023
Modified by: Lillian Ratliff

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error