Mixture-of-Experts Meets In-Context Reinforcement Learning

Abstract

In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning. However, two notable challenges remain in fully harnessing in-context learning within RL domains: the intrinsic multi-modality of the state-action-reward data and the diverse, heterogeneous nature of decision tasks. To tackle these challenges, we propose **T2MIR** (**T**oken- and **T**ask-wise **M**oE for **I**n-context **R**L), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models. T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To enhance task-wise routing, we introduce a contrastive learning method that maximizes the mutual information between the task and its router representation, enabling more precise capture of task-relevant information. The outputs of two MoE components are concatenated and fed into the next layer. Comprehensive experiments show that T2MIR significantly facilitates in-context learning capacity and outperforms various types of baselines. We bring the potential and promise of MoE to ICRL, offering a simple and scalable architectural enhancement to advance ICRL one step closer toward achievements in language and vision communities. Our code is available at [https://github.com/NJU-RL/T2MIR](https://github.com/NJU-RL/T2MIR).

Cite

Text

Wu et al. "Mixture-of-Experts Meets In-Context Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Wu et al. "Mixture-of-Experts Meets In-Context Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wu2025neurips-mixtureofexperts/)

BibTeX

@inproceedings{wu2025neurips-mixtureofexperts,
  title     = {{Mixture-of-Experts Meets In-Context Reinforcement Learning}},
  author    = {Wu, Wenhao and Liu, Fuhong and Li, Haoru and Hu, Zican and Dong, Daoyi and Chen, Chunlin and Wang, Zhi},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/wu2025neurips-mixtureofexperts/}
}