Variance-Seeking Meta-Exploration to Handle Out-of-Distribution Tasks
Abstract
Meta-Reinforcement Learning (meta-RL) yields the potential to improve the sample efficiency of reinforcement learning algorithms. Through training an agent on multiple meta-RL tasks, the agent is able to learn a policy based on past experience, and leverage this to solve new, unseen tasks. Accordingly, meta-RL promises to solve real-world problems, such as real-time heating, ventilation and air-conditioning(HVAC) control without accurate simulators of the target building. In this paper, we propose a meta-RL method which trains an agent on first order models to efficiently learn and adapt to the internal dynamics of a real-world building. We recognise that meta-agents trained on first order simulator models do not perform well on second order models, owing to the meta-RL assumption that the test tasks should be from within the same distribution as the training tasks. In response, we propose a novel exploration method called variance seeking meta-exploration which enables a meta-RL agent to perform well on complex tasks outside of its training distribution.Our method programs the agent to prefer exploring task dependent state-action pairs, and in turn, allows it to adapt efficiently to challenging second order models which bear greater semblance to real-world problems
Cite
Text
Grewal et al. "Variance-Seeking Meta-Exploration to Handle Out-of-Distribution Tasks." NeurIPS 2021 Workshops: DeepRL, 2021.Markdown
[Grewal et al. "Variance-Seeking Meta-Exploration to Handle Out-of-Distribution Tasks." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/grewal2021neuripsw-varianceseeking/)BibTeX
@inproceedings{grewal2021neuripsw-varianceseeking,
title = {{Variance-Seeking Meta-Exploration to Handle Out-of-Distribution Tasks}},
author = {Grewal, Yashvir Singh and de Nijs, Frits and Goodwin, Sarah},
booktitle = {NeurIPS 2021 Workshops: DeepRL},
year = {2021},
url = {https://mlanthology.org/neuripsw/2021/grewal2021neuripsw-varianceseeking/}
}