Yes, Q-Learning Helps Offline In-Context RL

Abstract

Existing scalable offline In-Context Reinforcement Learning (ICRL) methods have predominantly relied on supervised training objectives, which are known for having limitations in offline RL settings. In this work, we investigate the integration of reinforcement learning (RL) objectives into a scalable offline ICRL framework. Through experiments across more than 150 datasets derived from GridWorld and MuJoCo environments, we demonstrate that optimizing RL objectives improves performance by approximately 30% on average compared to the widely established Algorithm Distillation (AD) baseline across various dataset coverages, structures, expertise levels, and environmental complexities. Our results also reveal that offline RL-based methods, outperform online approahces, which are not specifically designed for offline scenarios. These findings underscore the importance of aligning the learning objectives with RL’s reward-maximization goal and demonstrates that offline RL is a promising direction for applying in ICRL settings.

Cite

Text

Tarasov et al. "Yes, Q-Learning Helps Offline In-Context RL." ICLR 2025 Workshops: SCOPE, 2025.

Markdown

[Tarasov et al. "Yes, Q-Learning Helps Offline In-Context RL." ICLR 2025 Workshops: SCOPE, 2025.](https://mlanthology.org/iclrw/2025/tarasov2025iclrw-yes/)

BibTeX

@inproceedings{tarasov2025iclrw-yes,
  title     = {{Yes, Q-Learning Helps Offline In-Context RL}},
  author    = {Tarasov, Denis and Nikulin, Alexander and Zisman, Ilya and Klepach, Albina and Polubarov, Andrei and Nikita, Lyubaykin and Derevyagin, Alexander and Kiselev, Igor and Kurenkov, Vladislav},
  booktitle = {ICLR 2025 Workshops: SCOPE},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/tarasov2025iclrw-yes/}
}