Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics

Abstract

A fundamental challenge in multi-task reinforcement learning (MTRL) is achieving sample efficiency in visual domains where tasks exhibit significant heterogeneity in both observations and dynamics. Model-based RL (MBRL) offers a promising path to sample efficiency through world models, but standard monolithic architectures struggle to capture diverse task dynamics, leading to poor reconstruction and prediction accuracy. We introduce the mixture-of-world models (MoW), a scalable architecture that integrates three key components: i) modular VAEs for task-adaptive visual compression, ii) a hybrid Transformer-based dynamics model combining task-conditioned experts with a shared backbone, and, iii) a gradient-based task clustering strategy for efficient parameter allocation. On the Atari 100k benchmark, **a single MoW agent** (trained once over Atari $26$ games) achieves a mean human-normalized score of $\mathbf{110.4}$%, competitive with the $\mathbf{114.2}$% achieved by the recent STORM-an ensemble of $26$ task-specific models-while requiring $\mathbf{50}$% fewer parameters. On Meta-World, MoW attains a $\mathbf{74.5}$% average success rate within $300$K steps, establishing a new state-of-the-art. These results demonstrate that MoW provides a scalable and parameter-efficient foundation for generalist world models.

Cite

Text

Zhang et al. "Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-mixtureofworld/)

BibTeX

@inproceedings{zhang2026iclr-mixtureofworld,
  title     = {{Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics}},
  author    = {Zhang, Boxuan and Zhang, Weipu and Feng, Zhaohan and Xiao, Wei and Sun, Jian and Chen, Jie and Wang, Gang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-mixtureofworld/}
}