Stabilizing Unsupervised Environment Design with a Learned Adversary

Abstract

A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of \emph{Unsupervised Environment Design} (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent’s current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on \emph{curation} and \emph{mutation} rather than \emph{generation} of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.

Cite

Text

Mediratta et al. "Stabilizing Unsupervised Environment Design with a Learned Adversary." Proceedings of The 2nd Conference on Lifelong Learning Agents, 2023.

Markdown

[Mediratta et al. "Stabilizing Unsupervised Environment Design with a Learned Adversary." Proceedings of The 2nd Conference on Lifelong Learning Agents, 2023.](https://mlanthology.org/collas/2023/mediratta2023collas-stabilizing/)

BibTeX

@inproceedings{mediratta2023collas-stabilizing,
  title     = {{Stabilizing Unsupervised Environment Design with a Learned Adversary}},
  author    = {Mediratta, Ishita and Jiang, Minqi and Parker-Holder, Jack and Dennis, Michael and Vinitsky, Eugene and Rocktäschel, Tim},
  booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents},
  year      = {2023},
  pages     = {270-291},
  volume    = {232},
  url       = {https://mlanthology.org/collas/2023/mediratta2023collas-stabilizing/}
}