OFFER: Off-Environment Reinforcement Learning

Abstract

Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.

Cite

Text

Ciosek and Whiteson. "OFFER: Off-Environment Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10810

Markdown

[Ciosek and Whiteson. "OFFER: Off-Environment Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/ciosek2017aaai-offer/) doi:10.1609/AAAI.V31I1.10810

BibTeX

@inproceedings{ciosek2017aaai-offer,
  title     = {{OFFER: Off-Environment Reinforcement Learning}},
  author    = {Ciosek, Kamil Andrzej and Whiteson, Shimon},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {1819-1825},
  doi       = {10.1609/AAAI.V31I1.10810},
  url       = {https://mlanthology.org/aaai/2017/ciosek2017aaai-offer/}
}