Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Abstract

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a sub-optimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL via Demonstrations (EMRLD) that exploit this information---even if sub-optimal---to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Rengarajan et al. "Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments." Neural Information Processing Systems, 2022.

Markdown

[Rengarajan et al. "Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/rengarajan2022neurips-enhanced/)

BibTeX

@inproceedings{rengarajan2022neurips-enhanced,
  title     = {{Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments}},
  author    = {Rengarajan, Desik and Chaudhary, Sapana and Kim, Jaewon and Kalathil, Dileep and Shakkottai, Srinivas},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/rengarajan2022neurips-enhanced/}
}