AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers

Abstract

Language models trained on diverse datasets unlock generalization by in-context learning. Reinforcement Learning (RL) policies can achieve a similar effect by meta-learning within the memory of a sequence model. However, meta-RL research primarily focuses on adapting to minor variations of a single task. It is difficult to scale towards more general behavior without confronting challenges in multi-task optimization, and few solutions are compatible with meta-RL's goal of learning from large training sets of unlabeled tasks. To address this challenge, we revisit the idea that multi-task RL is bottlenecked by imbalanced training losses created by uneven return scales across different tasks. We build upon recent advancements in Transformer-based (in-context) meta-RL and evaluate a simple yet scalable solution where both an agent's actor and critic objectives are converted to classification terms that decouple optimization from the current scale of returns. Large-scale comparisons in Meta-World ML45, Multi-Game Procgen, Multi-Task POPGym, Multi-Game Atari, and BabyAI find that this design unlocks significant progress in online multi-task adaptation and memory problems without explicit task labels.

Cite

Text

Grigsby et al. "AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers." Neural Information Processing Systems, 2024. doi:10.52202/079017-2776

Markdown

[Grigsby et al. "AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/grigsby2024neurips-amago2/) doi:10.52202/079017-2776

BibTeX

@inproceedings{grigsby2024neurips-amago2,
  title     = {{AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers}},
  author    = {Grigsby, Jake and Sasek, Justin and Parajuli, Samyak and Adebi, Daniel and Zhang, Amy and Zhu, Yuke},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2776},
  url       = {https://mlanthology.org/neurips/2024/grigsby2024neurips-amago2/}
}