In-Context Compositional Q-Learning for Offline Reinforcement Learning

Abstract

Accurate estimation of the Q-function is a central challenge in offline reinforcement learning. However, existing approaches often rely on a shared global Q-function, which is inadequate for capturing the compositional structure of tasks that consist of diverse subtasks. We propose In-context Compositional Q-Learning (ICQL), an offline RL framework that formulates Q-learning as a contextual inference problem and uses linear Transformers to adaptively infer local Q-functions from retrieved transitions without explicit subtask labels. Theoretically, we show that, under two assumptions---linear approximability of the local Q-function and accurate inference of weights from retrieved context---ICQL achieves a bounded approximation error for the Q-function and enables near-optimal policy extraction. Empirically, ICQL substantially improves performance in offline settings, achieving gains of up to 16.4\% on kitchen tasks and up to 8.8\% and 6.3\% on MuJoCo and Adroit tasks, respectively. These results highlight the underexplored potential of in-context learning for robust and compositional value estimation and establish ICQL as a principled and effective framework for offline RL.

Cite

Text

Xu et al. "In-Context Compositional Q-Learning for Offline  Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Xu et al. "In-Context Compositional Q-Learning for Offline  Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xu2026iclr-incontext/)

BibTeX

@inproceedings{xu2026iclr-incontext,
  title     = {{In-Context Compositional Q-Learning for Offline  Reinforcement Learning}},
  author    = {Xu, Qiushui and Huang, Yu-Hao and Jiang, Yushu and Zheng, Wenliang and Song, Lei and Wang, Jinyu and Bian, Jiang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/xu2026iclr-incontext/}
}