Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Jasmine Bayrooti, Carl Henrik Ek, Amanda Prorok

ICLR 2025

/iclr/2025/bayrooti2025iclr-efficient/

Abstract

Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our approach is the first that allows for reasoning about _joint_ uncertainty over transitions and rewards for optimistic exploration. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards, action penalties, and difficult-to-explore regions. Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.

PDF ICLR Semantic Scholar

Cite

Text

Bayrooti et al. "Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling." International Conference on Learning Representations, 2025.

Markdown

[Bayrooti et al. "Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/bayrooti2025iclr-efficient/)

BibTeX

@inproceedings{bayrooti2025iclr-efficient,
  title     = {{Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling}},
  author    = {Bayrooti, Jasmine and Ek, Carl Henrik and Prorok, Amanda},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/bayrooti2025iclr-efficient/}
}