Real-Time Reinforcement Learning

Abstract

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor Critic both in real-time and non-real-time settings.

Cite

Text

Ramstedt and Pal. "Real-Time Reinforcement Learning." Neural Information Processing Systems, 2019.

Markdown

[Ramstedt and Pal. "Real-Time Reinforcement Learning." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/ramstedt2019neurips-realtime/)

BibTeX

@inproceedings{ramstedt2019neurips-realtime,
  title     = {{Real-Time Reinforcement Learning}},
  author    = {Ramstedt, Simon and Pal, Chris},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {3073-3082},
  url       = {https://mlanthology.org/neurips/2019/ramstedt2019neurips-realtime/}
}