Robust Reinforcement Learning in Motion Planning

Abstract

While exploring to find better solutions, an agent performing on(cid:173) line reinforcement learning (RL) can perform worse than is accept(cid:173) able. In some cases, exploration might have unsafe, or even catas(cid:173) trophic, results, often modeled in terms of reaching 'failure' states of the agent's environment. This paper presents a method that uses domain knowledge to reduce the number of failures during explo(cid:173) ration. This method formulates the set of actions from which the RL agent composes a control policy to ensure that exploration is conducted in a policy space that excludes most of the unacceptable policies. The resulting action set has a more abstract relationship to the task being solved than is common in many applications of RL. Although the cost of this added safety is that learning may result in a suboptimal solution, we argue that this is an appropri(cid:173) ate tradeoff in many problems. We illustrate this method in the domain of motion planning.

Cite

Text

Singh et al. "Robust Reinforcement Learning in Motion Planning." Neural Information Processing Systems, 1993.

Markdown

[Singh et al. "Robust Reinforcement Learning in Motion Planning." Neural Information Processing Systems, 1993.](https://mlanthology.org/neurips/1993/singh1993neurips-robust/)

BibTeX

@inproceedings{singh1993neurips-robust,
  title     = {{Robust Reinforcement Learning in Motion Planning}},
  author    = {Singh, Satinder P. and Barto, Andrew G. and Grupen, Roderic and Connolly, Christopher},
  booktitle = {Neural Information Processing Systems},
  year      = {1993},
  pages     = {655-662},
  url       = {https://mlanthology.org/neurips/1993/singh1993neurips-robust/}
}