OFFER: Off-Environment Reinforcement Learning
Abstract
Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.
Cite
Text
Ciosek and Whiteson. "OFFER: Off-Environment Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10810Markdown
[Ciosek and Whiteson. "OFFER: Off-Environment Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/ciosek2017aaai-offer/) doi:10.1609/AAAI.V31I1.10810BibTeX
@inproceedings{ciosek2017aaai-offer,
title = {{OFFER: Off-Environment Reinforcement Learning}},
author = {Ciosek, Kamil Andrzej and Whiteson, Shimon},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2017},
pages = {1819-1825},
doi = {10.1609/AAAI.V31I1.10810},
url = {https://mlanthology.org/aaai/2017/ciosek2017aaai-offer/}
}