Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Abstract

Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our approach is the first that allows for reasoning about _joint_ uncertainty over transitions and rewards for optimistic exploration. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards, action penalties, and difficult-to-explore regions. Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.

Cite

Text

Bayrooti et al. "Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling." International Conference on Learning Representations, 2025.

Markdown

[Bayrooti et al. "Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/bayrooti2025iclr-efficient/)

BibTeX

@inproceedings{bayrooti2025iclr-efficient,
  title     = {{Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling}},
  author    = {Bayrooti, Jasmine and Ek, Carl Henrik and Prorok, Amanda},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/bayrooti2025iclr-efficient/}
}