Play to Generalize: Learning to Reason Through Game Play

Abstract

Developing reasoning capabilities in multimodal large language models (MLLMs) remains challenging. Motivated by literature suggesting that gameplay promotes transferable reasoning skills, we propose a novel post-training method, Visual Game Learning (ViGaL), where MLLMs develop generalizable reasoning skills through playing arcade-like games. Specifically, we show that training a 7B-parameter MLLM via reinforcement learning (RL) on simple games like Snake significantly enhances the downstream performance on multimodal math benchmarks like MathVista, and on multi-discipline questions like MMMU, without seeing any worked solutions, equations, or diagrams during RL. Remarkably, our model outperforms specialist models post-trained on benchmark-oriented multimodal reasoning data, while preserving the model’s performance on general visual benchmarks, a challenge where specialist models often fall short. Our findings suggest that multimodal reasoning can emerge from gameplay, pointing to a promising strategy of designing surrogate tasks for RL post-training. The code is available at https://yunfeixie233.github.io/ViGaL.

Cite

Text

Xie et al. "Play to Generalize: Learning to Reason Through Game Play." International Conference on Learning Representations, 2026.

Markdown

[Xie et al. "Play to Generalize: Learning to Reason Through Game Play." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xie2026iclr-play/)

BibTeX

@inproceedings{xie2026iclr-play,
  title     = {{Play to Generalize: Learning to Reason Through Game Play}},
  author    = {Xie, Yunfei and Ma, Yinsong and Lan, Shiyi and Yuille, Alan and Xiao, Junfei and Wei, Chen},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/xie2026iclr-play/}
}