Safe Reinforcement Learning
Robson Adem
Outline
Motivation and Problem Statement
Notions of Safety
Acting safely in known vs. in unknown environment
Different Approaches
Data-driven Optimal Control
via LQR
via MPC
Constrained MDP via Lyapunov Function
Imitation Learning and Curriculum Learning
Where to go from here?
Motivation
The problem with using RL for such systems
Unknown environment/dynamics
Unmodelled/unobserved states
Sensing errors
Dynamic feedback
Systems that change over time
Difficulty in guaranteeing the safety of a system during dev’t and deployment
Lack of fairness, well-being, and user agency in social networks
Failure to ensure discovery and novel experiences in recommender systems
Specifying Safe Behavior
Safe and Efficient Exploration in Reinforcement Learning
Andreas Krause, YouTube 2020
Specifying Safe Behavior
Specifying Safe Behavior
Specifying Safe Behavior
Specifying Safe Behavior
What does it mean to be safe in RL sense?
What does it mean to be safe in RL sense?
What does it mean to be safe in RL sense?
What does it mean to be safe in RL sense?
How do we quantify uncertainty and risk?
Notions of Safety: Worst-case
Notions of Safety: Worst-case
Notions of Safety: Worst-case
Notions of Safety: Worst-case
Notions of Safety: Stochastic Uncertain Environment
Notions of Safety: Stochastic Uncertain Environment
Notions of Safety: Stochastic Uncertain Environment
Notions of Safety: Value at Risk
Notions of Safety: Value at Risk
Notions of Safety: Conditional Value at Risk
Notions of Safety: Conditional Value at Risk
Notions of safety using Lyapunov Functions
Notions of safety in Lyapunov sense
The General Problem of the Stability of Motion (In Russian)
Aleksandr Lyapunov, Doctoral dissertation, Univ. Kharkov 1892 Translated 1992
Notions of safety in Lyapunov sense
The General Problem of the Stability of Motion (In Russian)
Aleksandr Lyapunov, Doctoral dissertation, Univ. Kharkov 1892 Translated 1992
Notions of safety in Lyapunov sense
The General Problem of the Stability of Motion (In Russian)
Aleksandr Lyapunov, Doctoral dissertation, Univ. Kharkov 1892 Translated 1992
Notions of Safety: Summary
We also looked at safety in Lyapunov sense!
Act safely in known environment
Act safely in unknown environment
Key challenge: Don’t know the consequences of actions taken!
Act safely in unknown environment with prior knowledge
Using prior knowledge to establish a good first policy!
Act safely in unknown environment with prior knowledge
Data-driven Optimal Control
via LQR
via MPC
Constrained MDP via Lyapunov Function
Imitation Learning and Curriculum Learning
Act safely in unknown environment with prior knowledge
Data-driven Optimal Control
via LQR
via MPC
Constrained MDP via Lyapunov Function
Imitation Learning and Curriculum Learning
Data-driven Optimal Control
Data-driven Optimal Control
Data-driven Optimal Control
Data-driven Optimal Control
Act safely in unknown environment with prior knowledge
Data-driven Optimal Control
via LQR
via MPC
Constrained MDP via Lyapunov Function
Imitation Learning and Curriculum Learning
Data-driven Optimal Control — Linear Dynamics
Data-driven Optimal Control — Linear Quadratic Regulator
Data-driven Optimal Control — Linear Quadratic Regulator
Data-driven Optimal Control — Linear Quadratic Regulator
Safely Learning to Control the Constrained Linear Quadratic Regulator
Sarah Dean, Stephen Tu, Nikolai Matni and Benjamin Recht ACC 2019
Data-driven Optimal Control — Linear Quadratic Regulator
Safely Learning to Control the Constrained Linear Quadratic Regulator
Sarah Dean, Stephen Tu, Nikolai Matni and Benjamin Recht ACC 2019
Act safely in unknown environment with prior knowledge
Data-driven Optimal Control
via LQR
via MPC
Constrained MDP via Lyapunov Function
Imitation Learning and Curriculum Learning
Data-driven Optimal Control —Model Predictive Control (MPC)
Data-driven Optimal Control — MPC vs LQR
The main differences is that LQR optimizes across the entire time window
(horizon) whereas MPC optimizes in a receding time window!
Data-driven Optimal Control — Robust MPC
Safe Reinforcement Learning Using Robust MPC
Mario Zanon and Sebastien Gros IEEE Transactions on Automatic Control 2020
Stability-Constrained Markov Decision Processes Using MPC
Mario Zanon, Sébastien Gros, and Michele Palladino Preprint to Automatica 2021
Learning-Based Model Predictive Control: Toward Safe Learning in Control
Lukas Hewing, Kim P. Wabersich, Marcel Menner, and Melanie N. Zeilinger Annual Review of Control, 2020
Act safely in unknown environment with prior knowledge
Data-driven Optimal Control
via LQR
via MPC
Constrained MDP via Lyapunov Function
Imitation Learning and Curriculum Learning
Notions of safety using Lyapunov Functions
Constrained MDP via Lyapunov Function
Lyapunov Design for Safe Reinforcement Learning
Theodore J. Perkins and Andrew G. Barto JMLR 2003
A Lyapunov-based Approach to Safe Reinforcement Learning
Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh Preprint 2018
Act safely in unknown environment with prior knowledge
Data-driven Optimal Control
via LQR
via MPC
Constrained MDP via Lyapunov Function
Imitation Learning and Curriculum Learning
Act safely in unknown environment with prior knowledge
Imitation Learning and Curriculum Learning
Where to go from here ?
This presentation includes content from the sources cited!