Learning Interactive World Model for Object-Centric Reinforcement Learning

Abstract

Agents that understand objects and their interactions can learn policies that are more robust and transferable. However, most object-centric RL methods factor state by individual objects while leaving interactions implicit. We introduce the Factored Interactive Object-Centric World Model (FIOC-WM), a unified framework that learns structured representations of both objects and their interactions within a world model. FIOC-WM captures environment dynamics with disentangled and modular representations of object interactions, improving sample efficiency and generalization for policy learning. Concretely, FIOC-WM first learns object-centric latents and an interaction structure directly from pixels, leveraging pre-trained vision encoders. The learned world model then decomposes tasks into composable interaction primitives, and a hierarchical policy is trained on top: a high level selects the type and order of interactions, while a low level executes them. On simulated robotic and embodied-AI benchmarks, FIOC-WM improves policy-learning sample efficiency and generalization over world-model baselines, indicating that explicit, modular interaction learning is crucial for robust control.

Cite

Text

Feng et al. "Learning Interactive World Model for Object-Centric Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Feng et al. "Learning Interactive World Model for Object-Centric Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/feng2025neurips-learning/)

BibTeX

@inproceedings{feng2025neurips-learning,
  title     = {{Learning Interactive World Model for Object-Centric Reinforcement Learning}},
  author    = {Feng, Fan and Lippe, Phillip and Magliacane, Sara},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/feng2025neurips-learning/}
}