On Rollouts in Model-Based Reinforcement Learning

Abstract

Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality.

Cite

Text

Frauenknecht et al. "On Rollouts in Model-Based Reinforcement Learning." International Conference on Learning Representations, 2025.

Markdown

[Frauenknecht et al. "On Rollouts in Model-Based Reinforcement Learning." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/frauenknecht2025iclr-rollouts/)

BibTeX

@inproceedings{frauenknecht2025iclr-rollouts,
  title     = {{On Rollouts in Model-Based Reinforcement Learning}},
  author    = {Frauenknecht, Bernd and Subhasish, Devdutt and Solowjow, Friedrich and Trimpe, Sebastian},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/frauenknecht2025iclr-rollouts/}
}