In-Context Reinforcement Learning with Algorithm Distillation
Abstract
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.
Cite
Text
Laskin et al. "In-Context Reinforcement Learning with Algorithm Distillation." NeurIPS 2022 Workshops: FMDM, 2022.Markdown
[Laskin et al. "In-Context Reinforcement Learning with Algorithm Distillation." NeurIPS 2022 Workshops: FMDM, 2022.](https://mlanthology.org/neuripsw/2022/laskin2022neuripsw-incontext-a/)BibTeX
@inproceedings{laskin2022neuripsw-incontext-a,
title = {{In-Context Reinforcement Learning with Algorithm Distillation}},
author = {Laskin, Michael and Wang, Luyu and Oh, Junhyuk and Parisotto, Emilio and Spencer, Stephen and Steigerwald, Richie and Strouse, Dj and Hansen, Steven Stenberg and Filos, Angelos and Brooks, Ethan and Gazeau, Maxime and Sahni, Himanshu and Singh, Satinder and Mnih, Volodymyr},
booktitle = {NeurIPS 2022 Workshops: FMDM},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/laskin2022neuripsw-incontext-a/}
}