Robust Domain Randomised Reinforcement Learning Through Peer-to-Peer Distillation

Abstract

In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.

Cite

Text

Zhao and Hospedales. "Robust Domain Randomised Reinforcement Learning Through Peer-to-Peer Distillation." Proceedings of The 13th Asian Conference on Machine Learning, 2021.

Markdown

[Zhao and Hospedales. "Robust Domain Randomised Reinforcement Learning Through Peer-to-Peer Distillation." Proceedings of The 13th Asian Conference on Machine Learning, 2021.](https://mlanthology.org/acml/2021/zhao2021acml-robust/)

BibTeX

@inproceedings{zhao2021acml-robust,
  title     = {{Robust Domain Randomised Reinforcement Learning Through Peer-to-Peer Distillation}},
  author    = {Zhao, Chenyang and Hospedales, Timothy},
  booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
  year      = {2021},
  pages     = {1237-1252},
  volume    = {157},
  url       = {https://mlanthology.org/acml/2021/zhao2021acml-robust/}
}