Vintix: Action Model via In-Context Reinforcement Learning

Abstract

In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems.

Cite

Text

Polubarov et al. "Vintix: Action Model via In-Context Reinforcement Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Polubarov et al. "Vintix: Action Model via In-Context Reinforcement Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/polubarov2025icml-vintix/)

BibTeX

@inproceedings{polubarov2025icml-vintix,
  title     = {{Vintix: Action Model via In-Context Reinforcement Learning}},
  author    = {Polubarov, Andrei and Nikita, Lyubaykin and Derevyagin, Alexander and Zisman, Ilya and Tarasov, Denis and Nikulin, Alexander and Kurenkov, Vladislav},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {49569-49602},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/polubarov2025icml-vintix/}
}