Vintix: Action Model via In-Context Reinforcement Learning
Abstract
In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems.
Cite
Text
Polubarov et al. "Vintix: Action Model via In-Context Reinforcement Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Polubarov et al. "Vintix: Action Model via In-Context Reinforcement Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/polubarov2025icml-vintix/)BibTeX
@inproceedings{polubarov2025icml-vintix,
title = {{Vintix: Action Model via In-Context Reinforcement Learning}},
author = {Polubarov, Andrei and Nikita, Lyubaykin and Derevyagin, Alexander and Zisman, Ilya and Tarasov, Denis and Nikulin, Alexander and Kurenkov, Vladislav},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {49569-49602},
volume = {267},
url = {https://mlanthology.org/icml/2025/polubarov2025icml-vintix/}
}